User talk:Citation bot/Archive 28

Archive 25 Archive 26 Archive 27 Archive 28 Archive 29 Archive 30 Archive 35

Arbitrary cite type changes

Some observations of Citation bot's template-changing behaviour from today:

  • In this edit, the bot classifies PC Gamer -- within the same edit -- as both a magazine and a newspaper (in the latter case renaming it "Pc Gamer"). PC Gamer has a website and a magazine. However, none of the online articles makes it into the mag and only a few printed ones ever get online, so why conflate the two? In no instance is PC Gamer a newspaper. The newspaper change also occurred here.
  • Citation bot changes GameStar, as seen here, and Rock Paper Shotgun, here, into journals. GameStar is a magazine–website hybrid like PC Gamer, while Rock Paper Shotgun is a website only; neither is a journal. I previously complained that Rock Paper Shotgun was being turned into a newspaper for no reason.
  • The bot inconsistently classifies MCV/Develop as a magazine and a journal.

These are from only the past handful of edits that popped up on my watchlist. I find these kinds of changes to be arbitrary and, given the inconsistency and the fact that these template changes do not actually affect the rendered article, the bot should not perform these changes unless there is a clear point in doing so. IceWelder [] 08:31, 1 October 2021 (UTC)

Added new array of works that often are both magazines and websites that will not be changed. AManWithNoPlan (talk) 11:59, 1 October 2021 (UTC)
Thus, {{fixed}} AManWithNoPlan (talk) 16:02, 4 October 2021 (UTC)

Job stalled, refuses to die

My overnight batch job of 2,199 articles ("Vehicles, part 2 of 3") stalled at 08:10 UTC. This can be seen in the latest set of bot edits, where the last edit to this set is item #140, an edit[1] to Driver (series).

I tried to remedy this at about 08:40 by using https://citations.toolforge.org/kill_big_job.php, which promptly responded Existing large job flagged for stopping.

But over an hour later, attempts to start a new job (with a new list) using https://citations.toolforge.org/linked_pages.php get still get a response of Run blocked by your existing big run.

Meanwhile, the bot was happy to process an individual page request from me: see [2] at 09:49. --BrownHairedGirl (talk) • (contribs) 10:05, 2 October 2021 (UTC)

Rebooted. AManWithNoPlan (talk) 11:28, 2 October 2021 (UTC)
Thanks, @AManWithNoPlan. I tried again to start a new batch, and the bot began work promptly.[3] --BrownHairedGirl (talk) • (contribs) 11:46, 2 October 2021 (UTC)


{{fixed}} the underly bug. AManWithNoPlan (talk) 16:01, 4 October 2021 (UTC)

Thanks for your prompt and hard work, @AManWithNoPlan. --BrownHairedGirl (talk) • (contribs) 19:18, 4 October 2021 (UTC)

Job dropped

My overnight batch of 2,198 articles ("Vehicles, part 3 of 3") was dropped at 08:00 UTC this morning, after this edit[4] to National Tyre Distributors Association, which was #661 in the list. See the bot contribs for that period, where that edit is #126 in the contribs list.

When I spotted it, I was able to start a new batch at 10:48[5], so the overnight batch wasn't stuck like yesterday.

This is a bit tedious. --BrownHairedGirl (talk) • (contribs) 11:14, 3 October 2021 (UTC)

Same thing's happened for me with my overnight batch of 2,200 this evening, which got dropped a few minutes past midnight UTC after this edit to TASC (think-tank) (#85 in the list). Only noticed it just now, and I've just started it going again from where it left off - it's at least started on the items from the newly-resumed job, so fingers crossed that the rest of the batch goes through OK.
(I'm presuming that it's actually dropped, as opposed to stalling, given that it hasn't [yet] spat out a Run blocked by your existing big run in response to my actions to resume the run. Yup, definitely dropped, since it's accepted [and started processing] the rest of the batch I fed it just now.) Whoop whoop pull up Bitching BettyAverted crashes 03:56, 4 October 2021 (UTC)
Sorry to hear that, @Whoop whoop pull up. These job drops are annoying. But the worst thing is the stuck or stalled jobs, like in the sections above and below. Since #Another job stalled, refuses to die stopped almost 24 hours ago, I haven't been able to start another batch job. It just keeps on giving me Run blocked by your existing big run.
I have lists of about 50,000 articles with bare URLs piled up and ready to process, and the 30+ hours of processing time lost would probably have cleared over 6,000 of them :( --BrownHairedGirl (talk) • (contribs) 10:19, 4 October 2021 (UTC)

{{fixed}} the underly bug. AManWithNoPlan (talk) 16:01, 4 October 2021 (UTC)

Another job stalled, refuses to die

After my big overnight job of 2,198 pages was dropped (see above #Job dropped), I ran another small job of 100 pages. That was processed successfully.

I then resumed the overnight job "Vehicles, part 3 of 3 resumed" (1,537 pages) ... but it has stalled.

See the latest bot contribs: that batched started processing at 11:41[6], but stalled after its fifth edit[7], at 11:42.

At 12:14 I used https://citations.toolforge.org/kill_big_job.php to try to kill this stalled job. Now 15 minutes later, I still can't start a new job: the response is Run blocked by your existing big run. --BrownHairedGirl (talk) • (contribs) 12:31, 3 October 2021 (UTC)

I have being re-trying intermittently throughout the day, but the bot still says Run blocked by your existing big run. --BrownHairedGirl (talk) • (contribs) 19:25, 3 October 2021 (UTC)
{{fixed}} with bot reboot. AManWithNoPlan (talk) 13:04, 4 October 2021 (UTC)
Thanks, @AManWithNoPlan. I was able to stat a new batch, which is now being processed.
Rather than resubmitting the stalled batch, I decided to try individual page requests for its first few unprocessed pages.
The batch is listed at https://en.wikipedia.org/w/index.php?title=User:BrownHairedGirl/Articles_with_bare_links&oldid=1047938690, where the first unprocessed page is #5: Need for Speed: Hot Pursuit 2. I did a toolbar request on that page, which caused it go into an apparently infinite loop of:
~Renamed "magazine" -> "journal"
~Renamed "journal" -> "magazine"
It's still doing that now. Please can you kill the bot task on that page, and look into what causes the loop?
Thanks. --BrownHairedGirl (talk) • (contribs) 13:52, 4 October 2021 (UTC)
Completely rebooted bot again. That seems to be the one problem that really kills the bot. An infinite loop. AManWithNoPlan (talk) 14:45, 4 October 2021 (UTC)

{{fixed}} the infinite loop and added a test that will make that does not happen again. AManWithNoPlan (talk) 16:01, 4 October 2021 (UTC)

Thanks again. --BrownHairedGirl (talk) • (contribs) 19:20, 4 October 2021 (UTC)

Caps: GSA Today

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 14:38, 3 October 2021 (UTC)
What happens
[8]
What should happen
[9]
We can't proceed until
Feedback from maintainers


Caps: EFORT Open Reviews

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:52, 5 October 2021 (UTC)
What should happen
[10]
We can't proceed until
Feedback from maintainers


Caps: APMIS

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:54, 5 October 2021 (UTC)
What should happen
[11]
We can't proceed until
Feedback from maintainers


Another batch job stalled

My batch job "Food, part 1 of 6" (595 pages) has been stalled for over an hour, since this edit[12] to Bill Knapp's.

@AManWithNoPlan, please can you take a peek? --BrownHairedGirl (talk) • (contribs) 20:31, 5 October 2021 (UTC)

I saw that and I rebooted the bot. No idea what is wrong. Will think about infinite loops. AManWithNoPlan (talk) 20:52, 5 October 2021 (UTC)
Thanks, @AManWithNoPlan. After I posted here, I tried starting a new batch, and that worked. --BrownHairedGirl (talk) • (contribs) 21:24, 5 October 2021 (UTC)
I discovered one way that the bot can die and not reset the big job counter and fixed it. At some point i will add a heartbeat that will reset if the bot has been nonresponsive for an hour. AManWithNoPlan (talk) 00:43, 6 October 2021 (UTC)
Something {{fixed}}. AManWithNoPlan (talk) 20:57, 7 October 2021 (UTC)

Bot attempts to edit category pages within categories

Status
{{wontfix}}, probably {{notabug}}
Reported by
Abductive (reasoning) 18:08, 7 October 2021 (UTC)
What happens
It's kind of hard to describe, but the bot seems to think that the subcategory pages within categories should be run through the process as if they were an article.
We can't proceed until
Feedback from maintainers


So, for example, in the Category:Human rights in Saudi Arabia there are 7 articles and 4 categories, The bot counts these as 11, and reports back for each category as if it was an article, for instance:

No changes needed. Category:Saudi Arabian human rights activists

Presumably if there was a citation with correctable errors on the subcategory page the bot would make an edit. But I have never seen a category page with a citation in it, and I have never seen the bot edit a category page. Even though the bot quickly runs the category pages it is presented with, it must take a little time to do nothing to each one, and in the aggregate this wastes bot time. If possible, could the bot be instructed to ignore subcategory pages? Abductive (reasoning) 18:08, 7 October 2021 (UTC)

I've actually seen at least three categories with citations on them, including two that were fixed by Citation bot: Category:Fa'afafine (citation fixed by Citation bot with this edit during a runthrough of Category:Third gender), Category:Feminizing hormone therapy (citation fixed by Citation bot with this edit during a runthrough of Category:Trans women; citation later removed from page entirely), and Category:Transgender studies (Citation bot found nothing to fix). It's rare (in fact, it looks like Citation bot has made only 40 edits in the Category namespace ever, judging from this), but it does happen. Whoop whoop pull up Bitching BettyAverted crashes 05:08, 10 October 2021 (UTC)
Not really needed, some categories do have citations in them, and if they don't, they get very very speedily skipped. Headbomb {t · c · p · b} 20:16, 7 October 2021 (UTC)
pages with no citations are VERY fast. AManWithNoPlan (talk) 20:58, 7 October 2021 (UTC)

Untitled_new_bug

Status
{{notabug}} that I can see
Reported by
Taksen (talk) 06:14, 8 October 2021 (UTC)
We can't proceed until
Feedback from maintainers


Something went wrong at Fire of Moscow (1812). I cannot figure out what it is, please take a look.Taksen (talk) 06:14, 8 October 2021 (UTC)

Bot stalls on page, crashes batch job

My big overnight batch "South Asia, part 5 of 6" (2,196 pages) stalled after this 09: 23 edit[13] to page 2162/2196: see edit #135 on this set of bot contribs.

I left it until about 09:48 before trying to start a new batch ("Food, part 3 of 6", 593 pages). The bot made its first edit to that batch at 09:49[14]. I had not needed to kill the first job.

I then set about working on the remaining 34 pages. Run Citation bot via the toolbar, let the page finish, run it in the next page ... then do manual followup on each page.

The first of those missed pages on which I invoked the bot was #2163 of "South Asia, part 5 of 6": Sambalpur (Lok Sabha constituency). That stalled, so I went on and processed the next nine. After more than an hour, the bot request on Sambalpur (Lok Sabha constituency) timed out as 502 Bad Gateway

It seems that in batch mode, the bot drops the stalled page more promptly. However, it should not also kill the batch, since the next 9 pages were fine.

I know that @AManWithNoPlan has recently put a lot of work into these stalling issues, but it's not quite fixed yet. --BrownHairedGirl (talk) • (contribs) 11:35, 8 October 2021 (UTC)

I have just finished manually processing the last 33 pages which the bot skipped. The bot had no problem with the last 33 pages; only Sambalpur (Lok Sabha constituency) was a problem. --BrownHairedGirl (talk) • (contribs) 13:21, 8 October 2021 (UTC)
{{wontfix}} for now. Still investigating. AManWithNoPlan (talk) 13:17, 12 October 2021 (UTC)

Another stalled job

See the latest bot contribs: my batch job food, part 5 of 6 (590 pages) stalled after this edit[15] 266/590 to Rodrick Rhodes, which is 16 in that contribs list.

i can't start a new batch. --BrownHairedGirl (talk) • (contribs) 21:25, 8 October 2021 (UTC)

{{wontfix}} for now. Still investigating. AManWithNoPlan (talk) 13:17, 12 October 2021 (UTC)

Caps: ESMO Open

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:25, 10 October 2021 (UTC)
What happens
[16]
What should happen
[17]
We can't proceed until
Feedback from maintainers


CAPS: TAPPI J / TAPPI Journal

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 11:57, 11 October 2021 (UTC)
What should happen
[18]
We can't proceed until
Feedback from maintainers


Local mode?

Has the ability to run this bot offline (with sufficient API rate limiting, both for the mediawiki API and the data provider APIs) been considered? That's one way to solve the discussions over capacity. It seems strange to me that in this day and age a computing resource like this has limited capacity. Enterprisey (talk!) 02:15, 14 September 2021 (UTC)

I'd be more than happy to host something, if someone can walk me through it. I have a spare windows machine, but I could install another OS on it. That said, a better scheduler/queuing system would fix all issues. Put all individual request at the top of the queue, put all batch requests at the bottom, problem solved. Headbomb {t · c · p · b} 03:07, 14 September 2021 (UTC)
I would also be happy to run my bare URL batches offline (if walked through it), and could use my backup machine for it (dual boot Win10/OpenSUSE). I would welcome being able to run bigger batches, instead of having to chop them up into small chunks.
However, AFAICS the bottleneck is in the data provider APIs, which already seem to be at their limit when Citation bot is using all 4 channels: my recent batches of biographies run at up to ~350 pages/hr when there are zero or one other jobs running, but slow to #200 pages/hour with two other concurrent jobs, and down to as low as 80 pages/hr with three other jobs. The slowdown is esp severe esp if the other jobs include pages with lots of refs: the bot's overall edit rate can fall from a peak of ~8 pages/min to 0.2 pages/min
So allowing more processes to use the APIs may seriously degrade the throughput of batch jobs. To avoid wasting the APIs' limited resources, we would still need some restraints on low-return speculative trawls such as those which I documented above.[19][20][21]
It seems to me that the real issue here is the limited capacity of the APIs. Is there any way those API bottlenecks can be addressed? @AManWithNoPlan: who runs these APIs? --BrownHairedGirl (talk) • (contribs) 09:20, 14 September 2021 (UTC)
Bibcodes use an API key. URLs us wikipedia zotero. Those are two bottlenecks. Neither of those are accessed when run in slow mode. If you look at category code, it actually can be run on the command line - I assume that works. Of course, you need keys etc.. AManWithNoPlan (talk) 11:35, 14 September 2021 (UTC)
@AManWithNoPlan: by slow mode, do you mean with the option "Thorough mode" selected? --BrownHairedGirl (talk) • (contribs) 13:07, 14 September 2021 (UTC)
Exactly. AManWithNoPlan (talk) 13:47, 14 September 2021 (UTC)
Thanks. So am I right in thinking that means that if this thorough-mode task was run from a user's own computer, it would share no APIs or other resources with Citation bot?
If so, it could run at whatever speed WP:BAG allows, and take a lot of load off Citation bot. --BrownHairedGirl (talk) • (contribs) 14:08, 14 September 2021 (UTC)
Yes, Zotero/URLs is shared. Although, you probably don't have a BibCode API key, so you would need on of those - each key would be different. AManWithNoPlan (talk) 14:11, 14 September 2021 (UTC)
Just to clarify, why would slow/thorough mode make fewer queries, and how can it get away with doing that? Enterprisey (talk!) 08:16, 15 September 2021 (UTC)
Slow/thorough mode expands URLs and adds bibcodes. That makes the process take longer (slower) since there are more queries per page. AManWithNoPlan (talk) 11:38, 15 September 2021 (UTC)
Accessing the bibcode API is still required for slow mode, then? I interpreted Neither of those are accessed when run in slow mode as saying that the APIs are not accessed during slow mode, but that doesn't make total sense to me. Enterprisey (talk!) 23:32, 15 September 2021 (UTC)
My phone ate my edit. Should be except in slow mode. AManWithNoPlan (talk) 00:58, 16 September 2021 (UTC)

flag as {{fixed}} since it seems to already exist as a feature. AManWithNoPlan (talk) 14:49, 15 October 2021 (UTC)

Caching?

I understand from the earlier arguments that the bot wastes much time looking up the metadata for citations that it already had processed in earlier runs, especially when run on batches of pages. How much (if any) storage space is available to the bot? If it doesn't already, could it cache the metadata of citations it processes (or the resulting template code, or maybe just a hash by which to recognize a previously-seen citation), so that it can waste less time when encountering an already-processed citation? —2d37 (talk) 03:04, 16 September 2021 (UTC)

URL data is cached on the zotero server - which the bot queries. Whether or not DOIs work is cached. Nothing else is cached. Most of the APIs are really fast other than URLs. BibCodes might be useful, but since queries are batched then there would be litte to gain. AManWithNoPlan (talk) 13:03, 16 September 2021 (UTC)
{{wontfix}} at this time, but on the to do list. AManWithNoPlan (talk) 14:48, 15 October 2021 (UTC)

PMC links vs. URL and accessdate data

Citationbot made some edits to an article with the comment "Removed URL that duplicated unique identifier. Removed accessdate with no specified URL.". Some of the journal citations did indeed have PMC identification numbers, but the fulltext URL links were not to the PMC fulltext, so they weren't straight duplicates. I'm not sure it's safe to assume that the PMC fulltext is always the best available online copy; in some cases another site might have a better scan of the same article. I'm not sure we should implicitly ban access-date parameters from any source that is on PMC, either. In one case, the article was not on PMC; there was a PubmedID, and a link to the publisher's site for the fulltext. In this case, the automated edit had the effect of concealing the existence of a publicly-available fulltext. I suspect this may not be the intended behaviour; perhaps the tool was just expected to prevent their being two links to the same PMC page in a single citation?

Separately, I'm uneasy in giving precedence to PMC links over other links, as PMC and Pubmed contain third-party tracking content from data brokers, currently including Google and Qualtrics. I wrote to the NIH some years back about this, pointing out that it could give these actors sensitive medical information if people looked up things they or their friends had been diagnosed with. They did not want to engage on the topic. One of the links deleted was to the Europe PMC page, which admittedly looks no better, European data regulations aside. This is a complex question, and it might be a good idea to discuss it at Wikipedia Talk:MED. HLHJ (talk) 23:55, 10 October 2021 (UTC)

Without a diff, this is hard to assess. Headbomb {t · c · p · b} 11:59, 11 October 2021 (UTC)
{{wontfix}} without a diff. AManWithNoPlan (talk) 14:48, 15 October 2021 (UTC)

Nomination for deletion of Template:Inconsistent citations

Template:Inconsistent citations has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. * Pppery * it has begun... 03:12, 17 October 2021 (UTC)

Flag as {{notabug}} to archive, since discussion is pretty much universal agreement. AManWithNoPlan (talk) 15:53, 18 October 2021 (UTC)
Status
{{notabug}}, I have fixed the page.
Reported by
Headbomb {t · c · p · b} 06:55, 17 October 2021 (UTC)
What happens
Bot fails to run / crashes
Replication instructions
Try to run on List of organisms named after famous people (born before 1900)
We can't proceed until
Feedback from maintainers


Needs to do a folllow up edit

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:11, 17 October 2021 (UTC)
What happens
[22]
What should happen
Same + [23]
We can't proceed until
Feedback from maintainers


10.1073/pnas doi's are only free after a set number of years, and the path through the code looks for free before it add the year. I will see if easily fixable. AManWithNoPlan (talk) 17:44, 17 October 2021 (UTC)

CAPS: AIDS and Behavior / AIDS & Behavior

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 13:33, 17 October 2021 (UTC)
What should happen
[24]
We can't proceed until
Feedback from maintainers


Apologies

Looking through the bot's contributions just now, it looks like I accidentally ran Citation bot through Category:Terrorism twice in a row (at least it found some more things to fix the second time round, so a bit yay, I guess?), rather than once; this was not intentional on my part. I think what happened is I started the bot running on the category, thinking it had already finished going through Category:Sexism, only to see a few minutes later that it wasn't quite done with the earlier category (note: when running Citation bot through big jobs like these, I rely on monitoring the bot's contributions page, as the Citation bot console invariably errors out with a 502 or 504 gateway error well before the bot finishes with the job; as the bot's actions only show up in the contributions log when it actually finds something to correct, it often takes some degree of guesswork when I'm trying to figure out if it's finished a run or not). Upon finding this out, I waited somewhat longer for Citation bot to completely finish with Category:Sexism, and then, assuming (somewhat stupidly in retrospect) that the first attempt had been blocked by the still-going earlier run (and, thus, hadn't taken), I went back to the Citation bot console a second time and started it on Category:Terrorism again - only the first attempt hadn't been blocked after all, and the bot proceeded to queue up the second terrorist run right behind the first (and did not, as I'd assumed would happen in this kind of situation, block the second attempt). Oops.🤦‍♀️ Anyone whose runs didn't go through because of this, feel free to give me a well-deserved trouting right now. Whoop whoop pull up Bitching BettyAverted crashes 00:13, 18 October 2021 (UTC)

It happens. {{notabug}} AManWithNoPlan (talk) 15:44, 18 October 2021 (UTC)

Issue parameter removed from Cite magazine

Status
{{wontfix}} since so rare
Reported by
AlexandraIDV 22:19, 17 October 2021 (UTC)
What happens
Bot removes issue parameter from {{Cite magazine}}. Possibly related to how the issue in question (correctly) is number zero.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Changeling:_The_Dreaming&diff=1050449374&oldid=1040179562
We can't proceed until
Feedback from maintainers


Probably because issue=0 doesn't make much sense to begin with. Here, exceptionally, it makes sense. I've bypassed it here. Headbomb {t · c · p · b} 23:00, 17 October 2021 (UTC)

Caps: Obsidian II

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 23:34, 18 October 2021 (UTC)
What should happen
[25]
We can't proceed until
Feedback from maintainers


Kill my latest run?

I accidentally rebooted the bot on a list it already ran through [26]. Could you kill my run, and save ~842 mostly pointless article attempts? Headbomb {t · c · p · b} 01:28, 19 October 2021 (UTC)

I think that can be {{fixed}} if you click on https://citations.toolforge.org/kill_big_job.php AManWithNoPlan (talk) 18:45, 19 October 2021 (UTC)
I tried that, but got no visual feedback saying anything interesting, simply a blank page with some text message that didn't seems related to anything. Headbomb {t · c · p · b} 19:48, 19 October 2021 (UTC)

Caps: AIMS Microbiology

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 00:32, 20 October 2021 (UTC)
What should happen
[27]
We can't proceed until
Feedback from maintainers


Arbitrary cite type changes (part 2)

CB continues to make conversions like what I reported recently, for example here, here, here, and here. I am still under the impression that CB should not perform such changes; they are arbitrary, cosmetic, and often improper. IceWelder [] 13:16, 13 October 2021 (UTC)

Game Informer is a magazine, and is appropriately converted to a cite magazine. Likewise RPS isn't a journal, though the conversion to cite magazine there is incomplete. Le monde is also a newspaper, so cite news is appropriate. Off all the diffs given, only the MCV thing is weird. Headbomb {t · c · p · b} 13:49, 13 October 2021 (UTC)
Game Informer and Le Monde have physical counterparts but the sources cited in both cases are strictly from the website, not printed. Game Informer having a magazine does not make any article published on the website a "magazine article" and vice versa. Cite magazine should be used for sources from physical mags; same with Cite news and Cite journal. As you correctly stated, RPS is not a journal, so CB changing |work= to |journal= is definitely incorrect. In fact, RPS has only ever had a website; a "conversion to cite magazine" should not occur. IceWelder [] 15:49, 13 October 2021 (UTC)
We've been over this before, online magazines are magazines. Headbomb {t · c · p · b} 17:10, 13 October 2021 (UTC)
I don't think we've been over this. The magazine and the website are managed completely separaretly, and the website's content match the description of an online magazine. Similarly, if RPS started releasing a magazine with the same branding but different coverage, the website wouldn't suddenly become an "online magazine" for having a physical magazine counterpart. IceWelder [] 18:17, 13 October 2021 (UTC)
See Special:Permalink/1047382929#magazine vs website and previous discussions. Headbomb {t · c · p · b} 18:28, 13 October 2021 (UTC)
Sorry, but I don't see any consensus in that discussion. The discussion merely shows that other users have disputed this behaviour; some form of consensus should be established, possibly through an RfC, at some point in the future. My original point stands that these kinds of edits are cosmetic and should not be performed by a bot. If it comes up in an FA review, that's another thing. The MCV and RPS changes are obvious errors amd should not be performed by any party. IceWelder [] 18:51, 13 October 2021 (UTC)
I have fixed a couple more odd ways that things can get changed that were in the "do not change" list. AManWithNoPlan (talk) 14:47, 15 October 2021 (UTC)
Thanks. I see the bot still classifies RPS as a journal here and newly PC Gamer as a newspaper here. Do you know what the cause of this is? It is obviously incorrect. IceWelder [] 12:48, 16 October 2021 (UTC)
These are different. They involve adding new data into a work parameter, as apposed to changing which work parameter is used. The source of the oddities is Zotero FYI. AManWithNoPlan (talk) 17:45, 17 October 2021 (UTC)
Would you consider not taking cite types from Zotero? It just classed Vanity Fair as a journal here. IceWelder [] 17:25, 18 October 2021 (UTC)
Add a bunch of websites to NON_JOURNAL_WEBSITES constant array. AManWithNoPlan (talk) 18:56, 19 October 2021 (UTC)
Sounds like a pain to maintain, IMO. I'll report here if I catch any other occurrences. Cheers. IceWelder [] 20:03, 19 October 2021 (UTC)

Is there a list somewhere of the most common websites for linking out? I would like to add a bunch of websites to the is_magazine, is_journal, etc lists AManWithNoPlan (talk) 12:34, 20 October 2021 (UTC)

WP:RSP and WP:VG/RS contain most of what I'll be seeing, as videogame-related articles make up most of my watchlist. IceWelder [] 12:39, 20 October 2021 (UTC)
Thank you. Hundreds of websites will be added. Should also greatly speed things up also. AManWithNoPlan (talk) 13:56, 20 October 2021 (UTC)
Flagging as {{fixed}} with the addition of several hundred websites. AManWithNoPlan (talk) 14:10, 21 October 2021 (UTC)

British Newspaper Archive

Status
{{fixed}} once need code is deployed
Reported by
BrownHairedGirl (talk) • (contribs) 17:05, 24 October 2021 (UTC)
What happens
Citation bot replaces a bare URL ref to the British Newspaper Archive with a cite template whose only content is Register | British Newspaper Archive. e.g. {{Cite web|url=https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001|title=Register | British Newspaper Archive}}, which renders as: https://www.britishnewspaperarchive.co.uk/viewer/bl/0000425/18440304/002/0001. {{cite web}}: Missing or empty |title= (help)
What should happen
Nothing. If the bot can't actually fill the bare ref with something which describes the page's content, it should leave the link bare.
Relevant diffs/links
[28], [29], [30], [31], [32]
We can't proceed until
Feedback from maintainers


Thanks for the prompt fix, @AManWithNoPlan. --BrownHairedGirl (talk) • (contribs) 20:09, 24 October 2021 (UTC)

The bot now actively repairs the damage done by refill and the web archive bot. AManWithNoPlan (talk) 13:43, 25 October 2021 (UTC)
That's great, @AManWithNoPlan. Can you expand a little more on what those repairs are? --BrownHairedGirl (talk) • (contribs) 18:25, 25 October 2021 (UTC)
See this: https://en.wikipedia.org/w/index.php?title=Robert_Henderson_(rugby_union,_born_1900)&diff=prev&oldid=1051783031 and also deleting all the archives to that site, since they are login urls. AManWithNoPlan (talk) 20:32, 25 October 2021 (UTC)
Thanks for the link. That looks like the least-worst way to handle that problem. --BrownHairedGirl (talk) • (contribs) 21:02, 25 October 2021 (UTC)

Caps: UCLA

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 19:04, 26 October 2021 (UTC)
What should happen
[33]
We can't proceed until
Feedback from maintainers


Adding website field

This seems obvious maybe you already considered it before but in this example would it make sense to include a |website=www.comedy.co.uk when a cite web? -- GreenC 00:38, 27 October 2021 (UTC)

inital thoughts. Probably a big array that maps common ones to english, such as CNN instead of www.cnn.com. And a blacklist like web.archive.org to not add. Finally, the last thing done in case it has a DOI. AManWithNoPlan (talk) 00:54, 27 October 2021 (UTC)
Yes blacklist for archive domains Wikipedia:List of web archives on Wikipedia plus these: (ghostarchive[.]org|conifer[.]rhizome.org|newspaperarchive|webarchiv[.]cz|digar[.]ee|bib-bvb[.]de|webcache[.]googleusercontent[.]com|timetravel[.]mementoweb|webrecorder[.]io|nla.gov.au) .. and new ones will become known. The mapping of domains to name would be nice. One method extract existing domains from URL and determine how most frequently written in work/website and use that as the default. eg. [[Time (magazine)|Time]] = www.time.com as most common usage .. such a table might be useful for other purposes as well. Almost like a separate program, build the table, download data on occasion and incorporate into the bot locally. -- GreenC 18:35, 27 October 2021 (UTC)
One way to improve this instead and which would benefit others would be to work on the Zotero translators behind Citoid, which I believe Citation bot references these days? Izno (talk) 18:40, 27 October 2021 (UTC)
No, do not do this. The work= parameter should be the name of the web site the link belongs to, not the address of the web site. The name of the web site in the diff given above is "British Comedy Guide", not "www.comedy.co.uk". Lots of people get this wrong but mass-automating this wrong interpretation is not the correct response to their mistakes. —David Eppstein (talk) 19:47, 27 October 2021 (UTC)
Can you look at https://github.com/ms609/citation-bot/pull/3790/files The array HOSTNAME_MAP, is the list that we came up with. Do any look wrong, and are we missing common ones that should be there? AManWithNoPlan (talk) 20:45, 27 October 2021 (UTC)
You might consider replacing unlinked legacy.com with the linked Legacy.com although in that case the name and address seem to be the same. —David Eppstein (talk) 21:17, 27 October 2021 (UTC)
It is working great. I will look into adding more websites. AManWithNoPlan (talk) 23:34, 27 October 2021 (UTC)

{{fixed}}

Removed PUBLISHER from citations

Status
{{notabug}}
Reported by
SpikeToronto 11:42, 29 October 2021 (UTC)
What happens
For some reason, the bot removed the |publisher= parameter from two uses of {{cite news}} (verify). I do not see that parameter in the list of deprecated/removed parameters at Template:Cite news.
What should happen
This parameter should not be removed by the bot.
Relevant diffs/links
Special:Diff/1052313664
We can't proceed until
Feedback from maintainers


When the publisher is basically the same as the work parameter, then it should not be included. Also, for most publications the publisher is not very important, such as academic journals, where the publishers generally have no control since the editors run the show. AManWithNoPlan (talk) 13:49, 29 October 2021 (UTC)

Another one

Newest case of incorrect cite type is here. Edge is a magazine and (formerly) a website; the website is cited. CB made it a newspaper. IceWelder [] 12:12, 29 October 2021 (UTC)

Not a newspaper, but a news source. I have added this to the could be both list. AManWithNoPlan (talk) 13:54, 29 October 2021 (UTC)
{{fixed}} AManWithNoPlan (talk) 14:55, 29 October 2021 (UTC)
Not all content on such websites is news, so {{cite news}} should be reserved for actual newspapers, I think. ALso just spotted this change where PC Gamer was classified as a newspaper. IceWelder [] 18:40, 29 October 2021 (UTC)
No, cite news is not just for newspapers. There is a reason it is as "news" and not "newspaper" for its title. This Citation Style 1 template is used to create citations for news articles in print, video, audio or web. Izno (talk) 18:47, 29 October 2021 (UTC)
Aha, but then changing cite web -> cite news is still redundant. |newspaper= is wrong in such cases either way. IceWelder [] 19:04, 29 October 2021 (UTC)
https://github.com/ms609/citation-bot/commit/31b6e9ce9e88ea5fd83638a7cba31f6f5b558fc7 much better use of that. AManWithNoPlan (talk) 18:49, 29 October 2021 (UTC)
Looks good, will see how it plays out. IceWelder [] 19:04, 29 October 2021 (UTC)

journals.lww.com

Status
{{fixed}}
Reported by
  — Chris Capoccia 💬 14:34, 29 October 2021 (UTC)
What happens
replaces lww.com URL in journal cite with DOI & DOI URL that is actually password protected and inaccessible. Replacing URL with DOI is generally fine, but DOI URL seems bizarre as should be using doi-access=free parameter. Also not sure if there is some rule possible for this case for DOIs that do not actually work
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


I have removed the journals.lww.com code. They shut that website down a while back, and obviously it is alive again. Weird. AManWithNoPlan (talk) 14:53, 29 October 2021 (UTC)

Transforming chapter DOI with URL should use chapter-url

Status
{{fixed}}
Reported by
  — Chris Capoccia 💬 19:26, 24 October 2021 (UTC)
What happens
transforming chapter DOIs like 10.1007/978-981-10-0266-3_45 adds URL using url parameter instead of chapter-url
What should happen
when Citation bot changes from cite journal to cite book and adds a URL, it needs to use chapter-url parameter
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


Adds expired PMC embargo date

Status
{{notabug}} I think
Reported by
Headbomb {t · c · p · b} 04:33, 1 November 2021 (UTC)
What happens
[34]
We can't proceed until
Feedback from maintainers


I am not sure what you are talking about. Please explain. AManWithNoPlan (talk) 14:28, 1 November 2021 (UTC)

Brain fart, misread the diff date. Headbomb {t · c · p · b} 17:42, 2 November 2021 (UTC)

Spurious journal

Status
{{fixed}}. Flagged as bad journal title
Reported by
Kanguole 08:58, 3 November 2021 (UTC)
What happens
treats the "Dissertations, Theses, and Capstone Projects" collection at CUNY as a |journal=
What should happen
the heading should not be treated as a journal; at most |series=CUNY Academic Works
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Old_Chinese&curid=475046&diff=1053315683&oldid=1049097603
We can't proceed until
Feedback from maintainers


Incorrect title due to link rot

Status
{{fixed}} once new code is deployed
Reported by
DDFoster96 (talk) 22:37, 3 November 2021 (UTC)
What happens
Bot adds an incorrect title for URLs which have rotted, taking the title from the redirected page e.g. "Useful website has shut down".
What should happen
The bot shouldn't add a title if the link has rotted. That'll be difficult to determine, for sure, but if there was a list of known problem domains/URLs the bot could avoid them in the future.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=The_Cone&diff=1053265687&oldid=1022976372
We can't proceed until
Feedback from maintainers


HOSTNAME_MAP

the discussion at User talk:Citation bot/Archive_28#Adding_website_field was archived too quickly, but in its brief appearance I bookmarked https://github.com/ms609/citation-bot/pull/3790/files to scrutinise the HOSTNAME_MAP array.

The issue I was looking for is websites which host more than one newspaper. The three examples I checked are:

  1. https://theguardian.com / https://guardian.co.uk — hosts both The Guardian (daily) and The Observer (Sundays)
  2. https://independent.co.uk — now just a website, but used to host both The Independent and The Independent on Sunday
  3. https://independent.ie — hosts both The Irish Independent and the Sunday Independent (Ireland)

In each case, HOSTNAME_MAP appears to be unaware of the Sunday variation. BrownHairedGirl (talk) • (contribs) 13:13, 31 October 2021 (UTC)

I have updated the ones of them on the list to wikilink to the website name. That makes it clearer than linking to the final landing page on wikipedia. AManWithNoPlan (talk) 13:55, 31 October 2021 (UTC)
I will be double checking new additions for that problem. AManWithNoPlan (talk) 13:57, 31 October 2021 (UTC)
Many thanks, @ AManWithNoPlan. I think it would be a good idea for someone to do a thorough check of exiting entries, but sadly I haven't the time to do it myself. A good starting point would be Category:Sunday newspapers, which isn't very big. --BrownHairedGirl (talk) • (contribs) 21:55, 31 October 2021 (UTC)

{{fixed}}

Problems expanding full citation from DOI for Encyclopedia of Arabic Language and Linguistics

Status
{{notabug}}
Reported by
  — Chris Capoccia 💬 15:34, 5 November 2021 (UTC)
What happens
expanding {{cite journal |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }} with Citation Bot does not result in a normal book citation but somehow only gets the chapter title. I had to hand edit into {{cite book |first1=Kimary N. |last1=Shahin |chapter=Palestinian Arabic |title=Encyclopedia of Arabic Language and Linguistics |editor1-first=Lutz |editor1-last=Edzard |editor2-first=Rudolf |editor2-last=de Jong |doi=10.1163/1570-6699_eall_EALL_COM_vol3_0247 }}. There are a lot more examples of DOIs for the same book at Levantine Arabic.
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


DOIs consist of both meta-data and URL redirection. Some DOI providers do not provide any meta-data, and some only a few bits. That is all they provided https://api.crossref.org/v1/works/10.1163/1570-6699_eall_eall_com_vol3_0247 AManWithNoPlan (talk) 15:58, 5 November 2021 (UTC)

sad to hear Brill couldn't get their act together and fill out the crossref info more completely.  — Chris Capoccia 💬 16:43, 5 November 2021 (UTC)

Messes with interwikilinks

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 06:29, 7 November 2021 (UTC)
What happens
[35]
We can't proceed until
Feedback from maintainers


Bare URL error

Status
{{wontfix}}
Reported by
Lugnuts Fire Walk with Me 18:07, 8 November 2021 (UTC)
What happens
For a word with an accent (say a French word) this happens
What should happen
This
We can't proceed until
Feedback from maintainers


This is beyond our ability to fix. https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Fwww.jpbox-office.com%2Ffichfilm.php%3Fid%3D8806 AManWithNoPlan (talk) 20:07, 8 November 2021 (UTC)

502 Bad Gateway error

Is there a reason why I'm getting a '502 Bad Gateway' error? I've been trying to use the bot for the article Nicole Kidman, but it keeps giving me this error twice now. Is the error occurring from my end, my internet or something? Or is something wrong with the page or tool? Is it perhaps because there are too many mistakes to fix that it overwhelms the system? Any suggestions on what to do? — Film Enthusiast 17:15, 3 November 2021 (UTC)

502 usually means the service is down in some way. I know another service is down right now so it may be systemic rather than this tool. Izno (talk) 17:44, 3 November 2021 (UTC)
The bot may have accepted the request even if you are shown a 502 error. Don't repeat the request a lot, give it some time. Abductive (reasoning) 16:38, 4 November 2021 (UTC)
I've waited for over a day now, just tried it again, and it gives me the same error. And I know it's not going through because there is no edit summary from the citation bot in the history tab, yet there are still several mistakes present in the citations that aren't being resolved. I'm manually going through each and every one to fix as many as I can, but I was hoping the bot could help out, not sure why it isn't working with me. — Film Enthusiast 18:33, 4 November 2021 (UTC)
I ran it just now on Nicole Kidman and it didn't make any changes. If you have a particular correction you think the bot should have made, you could make a separate section here on talk about that. The 502 error message (which is independent of any changes made or not made) is more likely on longer articles. Abductive (reasoning) 19:15, 4 November 2021 (UTC)

No longer overloaded. {{fixed}} AManWithNoPlan (talk) 22:31, 10 November 2021 (UTC)

Caps: La Plata

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 05:02, 9 November 2021 (UTC)
What should happen
[36]
We can't proceed until
Feedback from maintainers


Bot added chapter= to Template:cite journal

Status
{{wontfix}}, since it seems to be very rare, and it almost points out an underlying problem.
Reported by
Jonesey95 (talk) 19:15, 23 October 2021 (UTC)
What happens
The bot added |chapter= to a {{cite journal}} template
What should happen
It should not do that, since |chapter= is not a supported parameter in that template. See this explanation for more information.
Relevant diffs/links
*https://en.wikipedia.org/w/index.php?title=Tetricus_I&type=revision&diff=1050598791&oldid=1049443277
  • https://en.wikipedia.org/w/index.php?title=East_Asian_cultural_sphere&type=revision&diff=1051438526&oldid=1051172382
We can't proceed until
Feedback from maintainers


Isn't this a bug in the template, not the bot? Whoop whoop pull up Bitching BettyAverted crashes 17:17, 27 October 2021 (UTC)
Or, more likely, "pilot error". Taking the first case: {{cite journal}} is wrong: The Oxford Dictionary of Late Antiquity is not a journal (it is not really a dictionary either, more an encylopedia but...) So the citation should have been {{cite encyclopedia |encyclopedia=The Oxford Dictionary of Late Antiquity |title=Tetricus |url=http://www.oxfordreference.com/view/10.1093/acref/9780198662778.001.0001/acref-9780198662778-e-4640 |etc...}}. Agree? --John Maynard Friedman (talk) 18:09, 27 October 2021 (UTC)
And the second one isn't a journal either, though the problem is doubly complex because (presumably a previous bot run?) has tried to create |first= and |last= elements and made a complete regurgitated dog's breakfast of it:{{Cite journal |last=here] |first=[Author meta content |title=Vietnamese Historical Writing - Oxford Scholarship |year=2018 |publisher=Oxford University Press |url=https://oxford.universitypressscholarship.com/view/10.1093/oso/9780199225996.001.0001/oso-9780199225996-chapter-28 |language=en |doi=10.1093/oso/9780199225996.003.0028 |isbn=978-0-19-922599-6}} It does seem to have chapters (in this case, chapter 28) so |chapter= was not wrong. So what should CitationBot do? Well, if it is ultra-clever, it would say "I see evidence of chapters, journals don't have chapters, citation is not valid, raise red flag". Well that's an easy fix, shouldn't take more than ten minutes [sic]. --John Maynard Friedman (talk) 18:23, 27 October 2021 (UTC)
Pilot error for sure for the first one. Created at this edit.
The second was created at this edit; classic inattentive-editor-assuming-that-visual-editor/citoid-output-is-flawless. Alas, it really comes down to the metadata that citoid scrapes when creating a citation; gigo. Of course, gigo doesn't forgive editor inattention.
—Trappist the monk (talk) 18:38, 27 October 2021 (UTC)
This error category almost universally points to some type of human error. https://en.wikipedia.org/wiki/Category:CS1_errors:_chapter_ignored AManWithNoPlan (talk) 14:59, 29 October 2021 (UTC)

Citation Bot makes unwanted, incorrect changes

Status
{{fixed}}
Reported by
TeemPlayer (talk) 23:38, 6 November 2021 (UTC)
What happens
The 'dq=' was changed to 'q=' for all the urls in refs
What should happen
Nothing, they should be left as 'dq=', because 'q=' only takes you to the top of a cited page, while 'dq=' takes you directly to the actual highlighted text on the page.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=List_of_television_series_canceled_after_one_episode&type=revision&diff=1053839140&oldid=1051073508
We can't proceed until
Feedback from maintainers


TeemPlayer (talk) 23:38, 6 November 2021 (UTC)

Citation Bot makes unwanted, incorrect deletion

Status
{{fixed}}
Reported by
TeemPlayer (talk) 23:57, 6 November 2021 (UTC)
What happens
The '#v=onepage' is deleted from the url in a Cite error: A <ref> tag is missing the closing </ref> (see the help page).</nowiki> → <ref>{{Cite web|url=http://www.conceptcarz.com/article/article.aspx?articleID=3548|title = An Error Has Occured!}}</ref>
What should happen
nothing. The error message should not be used as an article title.
I suspect that the website has a malformed error page, so some workaround may be needed
Relevant diffs/links
[37]
We can't proceed until
Feedback from maintainers


Also, went back and fixed a dozen pages with such bad titles - about half from refill and that other old bot. AManWithNoPlan (talk) 14:29, 13 November 2021 (UTC)

Thanks for another prompt fix! And for going the extra steps to cleanup previous errors.
It's sad to see how many websites can't even return proper error codes. BrownHairedGirl (talk) • (contribs) 20:52, 13 November 2021 (UTC)

caps

Status
{{fixed}} for this journal title.
Reported by
deisenbe (talk) 01:40, 17 November 2021 (UTC)
What happens
Capitalized word in title (On) changed to lower case
What should happen
Nothing, leave it alone
Relevant diffs/links
[38]
We can't proceed until
Feedback from maintainers


This should be generalized to all prepositions in final words/all final words. Headbomb {t · c · p · b} 05:10, 17 November 2021 (UTC)

Adds issue= to non-periodical conference proceedings paper in Citation Style 2

Status
{{fixed}}, will no longer query HDL, if DOI is in CrossRef and works.
Reported by
David Eppstein (talk) 08:13, 17 November 2021 (UTC)
What happens
Special:Diff/1055674920
What should happen
The IEEE site that the doi redirects to is down ("temporarily unavailable") so I can't tell whether the issue number added in this edit should really be a volume number in a series or something else, but in any case, definitely not that.
We can't proceed until
Feedback from maintainers


Update: Turns out the bogus issue number is actually the number of the technical report version linked in the hdl= parameter (which links to a technical report preprint of the same paper). It's still the wrong thing for the bot to have added to the citation, but at least maybe that information will help narrow down how to fix the bot so it doesn't continue making this kind of mistake. —David Eppstein (talk) 09:03, 17 November 2021 (UTC)

Gateway error

Status
{{wontfix}} since bot is simply overloaded and this is described in documentation
Reported by
Jo-Jo Eumerus (talk) 17:22, 17 November 2021 (UTC)
What happens
502 "bad gateway" errors
Replication instructions
I've been trying to run the bot on User:Jo-Jo Eumerus/Proxima Centauri b
We can't proceed until
Feedback from maintainers


Please wait an hour at least and try again. But first make sure bot did not actually run already. AManWithNoPlan (talk) 17:31, 17 November 2021 (UTC)

it's going to be a lot more than an hour. but look at the "user contributions" page. everyone and their brother is working on some job of 2000 pages. citation bot is slowly trudging through the list and might be available for individual pages tomorrow maybe. also look through the discussion in this page. there are already way too many comments related to the overloaded state of the bot.  — Chris Capoccia 💬 22:15, 17 November 2021 (UTC)

CAPS: SPUMS J

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 18:56, 17 November 2021 (UTC)
What should happen
[39]
We can't proceed until
Feedback from maintainers


Journal of the Royal Asiatic Society Hong Kong Branch

Status
{{fixed}}, rare but bad. Tests added too.
Reported by
Underwaterbuffalo (talk) 07:02, 21 November 2021 (UTC)
What happens
journal= Journal of the Royal Asiatic Society Hong Kong Branch is being replaced by journal= Royal Asiatic Society Hong Kong Branch
What should happen
journal= Journal of the Royal Asiatic Society Hong Kong Branch should be kept, since the name of the journal is "Journal of the Royal Asiatic Society Hong Kong Branch" in full
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Nga_Tsin_Wai_Tsuen&diff=1056242765&oldid=1049618918
We can't proceed until
Feedback from maintainers


bot changes cite journal to cite document

Status
{{fixed}}
Reported by
Trappist the monk (talk) 13:52, 21 November 2021 (UTC)
What happens
bot changes {{cite journal}} to {{cite document}}, a redirect to {{cite journal}}, so a more-or-less pointless exercise
What should happen
in this particular case, the source is a book (the templates have |chapter= and |isbn= as clues) so the template should have been changed to {{cite book}}
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


News agency parameter

Status
{{notabug}} - agency parameter is not correct parameter for citations directly to UPI, etc.
Reported by
TheTechnician27 (Talk page) 22:27, 22 November 2021 (UTC)
What happens
Citation bot changes news agencies from 'agency=' to 'work='. This is patently incorrect.
What should happen
Not do this; change citations with Associated Press, Reuters, United Press International, etc. links from 'work=' or 'publisher=' to 'agency=' for 'cite news' templates.
We can't proceed until
Feedback from maintainers


bot added |issue= when |number= already present

Status
{{fixed}}
Reported by
Trappist the monk (talk) 14:48, 22 November 2021 (UTC)
What happens
|issue= and |number= in {{cite journal}} are exact aliases of each other. In this case the value assigned to |number= appears to be incorrect while the value that the bot assigned to the new |issue= seems to be correct. Because both are present, and because only one is allowed, Module:Citation/CS1 emits the redundant parameter error message.
What should happen
bot should check for the existence of aliases before adding new parameters
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


That was obscure and rare. Thank you for the report. AManWithNoPlan (talk) 15:32, 23 November 2021 (UTC)

Caps: Drug Des Deliv

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 01:34, 23 November 2021 (UTC)
What should happen
[40]
We can't proceed until
Feedback from maintainers


Adds |volume=n/a |issue=n/a

Status
{{fixed}}
Reported by
IceWelder [] 19:03, 24 November 2021 (UTC)
What happens
The bot adds "|volume=n/a |issue=n/a" to {{cite journal}}
Relevant diffs/links
[41]
We can't proceed until
Feedback from maintainers


Only four requests at a time?

Status
new bug
Reported by
Abductive (reasoning) 22:54, 2 August 2021 (UTC)
What happens
It seems that the bot can only work on four jobs at any one time.
We can't proceed until
Feedback from maintainers


I sampled the bot's edits going back a few days, and seems that the bot can only interleave, or even accept a single page, only four requests at any one time. At no point can I find five jobs interleaving, and (although this is harder to be certain about) at no point when there are four jobs interleaving can a fifth job be found, even a single page requested. Is this deliberate, and if yes, is it really a necessary constraint on the bot? Abductive (reasoning) 22:54, 2 August 2021 (UTC)
that is what i have observed and complained about also. I am convinced that default PHP config is 4. Someone with tool service access needs to get the bot a custom lighttd config file. AManWithNoPlan (talk) 23:03, 2 August 2021 (UTC)
Gah. Abductive (reasoning) 23:07, 2 August 2021 (UTC)
lol, you people with "jobs". the rest of us with single page requests can't get anything in no matter how many jobs.  — Chris Capoccia 💬 11:20, 3 August 2021 (UTC)
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd Look at PHP and the "Default_configuration" area that starts collapsed. AManWithNoPlan (talk) 19:18, 3 August 2021 (UTC)

This is also part of the wider problem that the bot needs much more capacity, and also that a lot of its time is taken up speculative trawls through wide sets of articles which have not been identified as needing bot attention and which often produce little change. Huge categories are being fed to the bot which changes little over 10% of them, and most of those changes are trivia (type of quote mark in title) or have effect at all on output (removing redundant parameters or changing template type). It would help a lot if those speculative trawls were given a lower priority. --BrownHairedGirl (talk) • (contribs) 22:54, 9 August 2021 (UTC)

Who would decide what " speculative trawls " are? And what should the limit be? Might be hard to find something that can be agreed on. Perhaps the users that request these large categories see them as very important, while you don't. Of course it will be easy to know that certain specially created maintenance categories will give a high output, and do a lot of work, but if a user just wants to request a "normal" category they can't know how many % of the pages will actually get changed beforehand.
I agree capacity should be increased, more jobs at the same time would be such a good thing, however deciding that one page might be more important than another does not fix the root cause.
I do think there might be something going for giving priority to people that request a single (or low amount of) page(s). A person could be running the most important category that exists, but if I just want to have a single page check about a topic that I am knowledgeable about or have a big interest in, it is a hard swallow waiting for multiple thousand page jobs to finish. This actually has made me just give up a few times, leaving pages that could have been fixed and checked (with my knowledge about said subject) broken, I'm sure many can recognise themselves in this.
It is indeed important to fix high priority pages, and especially improve capacity, but lets not forget about the people who edit on topics that they enjoy, and just want to use the bot on something that might not be important according to some maintenance category, but is important to them. The more people that want to keep using the bot, the better! Redalert2fan (talk) 00:22, 10 August 2021 (UTC)
@Redalert2fan: I think you are missing my point, which is that there is no benefit to anyone in having the bot processing lots of articles where there is nothing for it to do. No matter how important anyone thinks an article is, there is no gain in having the bot spend ages deciding that there is nothing to.
The reason that single pages get locked out is because its capacity is being used up by these speculative trawls, by which I mean simply that they are not categories selected because they concentrate article which need the bot's attention -- these are "see if you find anything" batches, rather than "cleanup the problem here" batches.
One or two editors are repeatedly feeding it big categories on a huge variety of topics simply because they are big categories which fit under the 4,400 limit for categories. I have analysed the results, and in many cases the result is that only 10%–15% of the pages are edited, and only about half of those have non-trivial changes. So about 95% of the time that the bot spends on these huge categories is completely wasted.
When a resource is limited, it is best used by prioritising pages which have been selected on the basis that there is a high likelihood of something to do. --BrownHairedGirl (talk) • (contribs) 00:44, 10 August 2021 (UTC)
I see, there is no denying that there is no gain in having the bot spend ages deciding that there is nothing to.
Wouldn't it be an even quicker fix to ask these few editors why they run these not so intermediately helpful categories, and notify them of the problems it causes, and the benefits that can be gained by not requesting any category? It seems more like operator error than a bot mistake, and limiting the bots abilities for something that is caused by a few users, seems maybe questionable?
I agree with the points you make, but I don't feel like we should limit hundred of potential editors that request pages with what you would describe as "less than optimal request" because of 2 or so people. Even though we are limited, I don't think we need a strict priority system. If someone random editor want's to request a hundred pages that have their interest, we can't expect everyone to know beforehand if their request is an optimum use of the system, and see if you find anything might be an honest use case still. Obviously if as you say specific editors that seem to know what they are doing use the bot in a way that basically blocks out others for little gain all the time, action should be taken.
Some sort of priority system might indeed be a good idea, whether if it is in the way of important maintenance categories, "the pages with a high likelihood of something to do" or just giving priority to small request etc. Though it has to be a priority system for some types of requests, not a limitation for all request in my opinion, especially if the problem comes from a very minor selection of users. Redalert2fan (talk) 01:11, 10 August 2021 (UTC)
Max category size shrunk by one-quarter. AManWithNoPlan (talk) 13:08, 10 August 2021 (UTC)
Thanks, AManWithNoPlan. That's helpful, but wouldn't it be better to reduce it to the same 2200 limit size as the linked-from-page limit? --BrownHairedGirl (talk) • (contribs) 14:32, 10 August 2021 (UTC)
@Redalert2fan: Since early July, when I started using Citation bot in to clean up bare URLs, I have seen two editors repeatedly using the bot unproductively.
One was JamCor, who was using the bot to process the same set of almost 200 articles 3 or 4 times per day. Many of the articles were huge, taking several minutes each to process, so i estimated that about 20% of the bot's time was being taken on those 200 articles. I raised it on JamCor's talk, and they stopped, but only after the second request.
The other is Abductive, with whom I raised the problem several times on this page: see User talk:Citation_bot/Archive 27#throttling_big_category_runs. Sadly, that one persists, and I gave up making my case. When I started writing this post a few hours ago in response to you, I analysed the then-current recent contribs of the bot. Abductive had started the bot scanning Category:Use dmy dates from August 2012,and by then the bot had processed 1433 of the category's 4379 pages, but had saved an edited on only 141 of them, i.e. less than 10%. A with many of Abductive's previous big runs, I can't see any way in which this run could have been selected as a set which concentrates articles of interest to Abductive, or which concentrates articles of high-importance, or which concentrates articles that have been identified as being likely to have problems this bot can fix. The only criterion which I can see for its selection is that its size (4379 pages) is very close to the 4400 maximum size of Citation bot category jobs. A quick glance at the parent Category:Use dmy dates shows that very few categories are so close to the limit without exceeding it.
So AFAICS,the only reason for selecting this job was that it is a big set of articles which can be thrown at the bot with no more effort than cop-pasting the category title. I may of course have missed something, and if so I hope that Abductive will set me right. --BrownHairedGirl (talk) • (contribs) 14:33, 10 August 2021 (UTC)
I meant cut to one fourth, not cut by one fourth. So, category is now half the linked pages API. AManWithNoPlan (talk) 14:37, 10 August 2021 (UTC)
Cut to 1100 items! This is extreme. Grimes2 (talk) 14:53, 10 August 2021 (UTC)
@AManWithNoPlan: thanks. 1100 is much better.
However, that will reduce but not eliminate the problem of an editor apparently creating bot jobs just because they can. Such jobs will now require 4 visits to the webform in the course of a day, rather than just one, but that's not much extra effort. --BrownHairedGirl (talk) • (contribs) 15:13, 10 August 2021 (UTC)
People using the bot is not a problem. Abductive (reasoning) 18:02, 10 August 2021 (UTC)
Indeed, people using the bot is not a problem.
The problem is one person who repeatedly misuses the bot. --BrownHairedGirl (talk) • (contribs) 18:27, 10 August 2021 (UTC)
It is not possible to misuse the bot. Having the bot make the tedious decisions on what needs fixing is far more efficient than trying to work up a bunch of lists. Unfortunately, even the best list can be ruined if the API that the bot checks happens to be down. This is why it is it inadvisable to create lists that concentrate on one topic. Abductive (reasoning) 19:49, 10 August 2021 (UTC)
Bot capacity is severely limited. There is no limit to how much editors can use other tools to make lists, so that makes more efficient use of the bot.
Think of the bot like a hound, which is far more effective at finding quarry if started in the right place. The hound will waste a lot of time if started off miles away from the area where previous clues are.
Lots of other editors are targeting the bot far more effectively than your huge category runs. --BrownHairedGirl (talk) • (contribs) 22:11, 10 August 2021 (UTC)
Hey BrownHairedGirl, I agree with your ideas, but in the end there are no rules for what the bot can be used for so calling it misuse isn't a fair description. Anyone is allowed to use it for anything. Abductive can request what he wants, and creating bot jobs just because you can is allowed. In my eyes every page is valid to check (provided it isn't just a repeat of the same page or groups of pages frequentely).Redalert2fan (talk) 00:13, 11 August 2021 (UTC)
Just to be sure, whether that is the optimal way to use the bot or not is still a fair point of discussion. Redalert2fan (talk) 00:17, 11 August 2021 (UTC)
The question of self-restraint by users of an unregulated shared asset is a big topic in economics.
The article on the tragedy of the commons is an important read. It's well written but long. if you want a quick summary, see the section #Metaphoric meaning.
In this case, it would take only 4 editors indiscriminately feeding the bot with huge sets of poorly-selected articles to create a situation where 90% of the bots efforts changed nothing, and only 5% did anything non-trivial. That would be a tragic waste of the fine resource which the developers and maintainers of this bot have created, and would soon lead to calls for regulation. The question now is whether enough editors self-regulate to avoid the need for restrictions. --BrownHairedGirl (talk) • (contribs) 05:30, 11 August 2021 (UTC)
@AManWithNoPlan: the new limit of 1100 does not seem to have taken effect; see this[42] at 18:07, where the bots starts work on category of 1156 pages.
That may be due to expected delays in how things get implemented, but I thought it might help to note it. --BrownHairedGirl (talk) • (contribs) 18:51, 10 August 2021 (UTC)
Bot rebooted. AManWithNoPlan (talk) 20:15, 10 August 2021 (UTC)
Max category cut again to 550 and now prints out list of category pages so that people can use linked pages API instead, which also means that if the bot crashes the person can restart it where is left off instead of redoing the whole thing as it does with category code. AManWithNoPlan (talk) 20:25, 10 August 2021 (UTC)
Great work! Thanks. --BrownHairedGirl (talk) • (contribs) 22:02, 10 August 2021 (UTC)

It seems that the low-return speculative trawls have re-started. @Abductive has just run a batch job of Category:Venerated Catholics by Pope John Paul II; 364 pages, of which only 29 pages were actually edited by the bot, so 92% of the bot's efforts on this set were wasted. The lower category limit has helped, because this job is 1/10th of the size of similar trawls by Abductive before the limit was lowered ... but it's still not a good use of the bot. How can this sort of thing be more effectively discouraged? --BrownHairedGirl (talk) • (contribs) 11:57, 27 August 2021 (UTC)

A number of editors have pointed out to you that using the bot this way is perfectly acceptable. In addition, there are almost always four mass jobs running, meaning that users with one article can't get access to the bot. A run of 2200 longer articles takes about 22 hours to complete, so if I had started one of those, it would have locked such users out for nearly a day. By running a job that lasted less than a hour, I hoped that requests for smaller and single runs could be accommodated. And, in fact, User:RoanokeVirginia was able to use the bot as soon as my run completed. Abductive (reasoning) 18:14, 27 August 2021 (UTC)
@Abductive: on the contrary, you are the only editor who repeatedly wastes the bot's time in this way. It is quite bizarre that you regard setting the bot to waste its time as some sort of good use.
On the previous two occasions when you did it, the result was that the limits on job size were drastically cut. --BrownHairedGirl (talk) • (contribs) 18:47, 27 August 2021 (UTC)
That was in response to your complaints. Since I ran a job that was within the new constraints, I was not misusing the bot. You should request that the limits be increased on manually entered jobs, and decreased on category jobs. There is no particular reason that 2200 is the maximum. Abductive (reasoning) 18:52, 27 August 2021 (UTC)
@Abductive: you continue to evade the very simple point that you repeatedly set the bot to do big jobs which achieve almost nothing, thereby displacing and/or delaying jobs which do improve the 'pedia. --BrownHairedGirl (talk) • (contribs) 19:04, 27 August 2021 (UTC)
Using the bot to check a category for errors is an approved function of the bot. The fundamental problem is the limit of 4 jobs at a time. Also, the bot is throttled to run considerably slower than it could, which is a holdover from the time when it was less stable. The various throttlings, which as I recall were implemented in multiple places, should be re-examined and the bot re-tuned for its current capabilities. Abductive (reasoning) 19:11, 27 August 2021 (UTC)
This is not complicated. Whatever the bot's speed of operation, and whatever the limit on concurrent jobs, its capacity is not well used by having it trawl large sets of pages where it has nothing to do. I am surprised that you repeatedly choose to ignore that. --BrownHairedGirl (talk) • (contribs) 19:19, 27 August 2021 (UTC)
I am not ignoring anything. Bots exist to do tedious editing tasks. Your notion that editors have to do the tedious work before giving the bot a task is contrary to the purpose of bots. A number of proposals have been put forward to improve bot performance or relieve pressure on the bot, such as allowing multiple instances of the bot, or allowing users to run the bot from their userspace. These proposals have not been implemented. As the bot is currently configured, there will always be load problems. Abductive (reasoning) 19:29, 27 August 2021 (UTC)
Load problems that you are exacerbating. We've requested a million times to have better scheduling, or more ressources, but no dice so far. You're cognizant there's an issue, and you yet repeatedly feed the bot low-priority low-efficiency work. That's pretty WP:DE / WP:IDIDNTHEARTHAT behaviour from where I stand. Headbomb {t · c · p · b} 19:34, 27 August 2021 (UTC)
I have been holding off of all different kinds of runs lately. Check the bot's edits for the last week or so. Abductive (reasoning) 00:58, 28 August 2021 (UTC)
Abductive, increasing the bot's capacity would:
  • require a lot of work by the editors who kindly donate their time to maintain and develop this bot. WP:NOTCOMPULSORY, and they should by pressed to donate more time. Their efforts are a gift from them, not a contract.
  • exacerbate to some extent the usage limitations of the external tools which the bot relies on. Increasing the speed of the bot's operation will mean that those limits are encountered more frequently.
The bot will probably always have load problems, because there is so much work to be done.
Two examples:
  1. Headbomb's jobs of getting the bot to cleanup refs to scholarly journals. That is high-value, because peer-reviewed are the gold standard of WP:Reliable sources, and it is also high labour-saving because those citations are very complex, so a big job for editors to fix manually. It is high-intensity work for the bot because many of the articles have dozens or even hundreds of citations. I dunno Headbomb's methodology for building those jobs or what numbers can be estimated from that, but assume that tens of thousands of such pages remain to be processed.
  2. my jobs targeting bare URLs are focused on a longstanding problem of the core policy WP:V being undermined by linkrot, which may become unfixable. I have lists already prepared of 75,000 articles which need the bot's attention, and have a new methodology mostly mapped out to tackle about 300,000 more of the 450k more articles with bare URL refs.
My lists are (like Headbomb's lists) all of bot-fixable problems, so they don't waste the bot's time, but they do not tackle such high-value issues as Headbomb's list, so I regard mine as a lesser priority than Headbomb's.
So whatever the bot's capacity, there will be enough high priority high-efficiency work to keep it busy for a long time to come. it is not all helpful for that work to be delayed or displaced because one editor likes to run big job but doesn't like doing the prep work to create productive jobs.
In the last few weeks I have approached 4 editors about what seemed to me to poor use of the bot.
Only Abductive persists. --BrownHairedGirl (talk) • (contribs) 20:57, 27 August 2021 (UTC)
Reading the discussion above, I think that this issue is becoming increasingly adversarial and perhaps a concrete proposal for action would be a way to fix this.
This could include:
1) If it is an easy technical fix (the maintainers would need to chime in on this), bringing the PHP issue to someone with tool service access and increase the bots capacity
2) Adopt a definition/policy on "speculative trawling", perhaps with a notice on the bot page to nudge users into considering the bots limited capacity.
3) Any other ideas?
@Abductive @BrownHairedGirl @Headbomb @Redalert2fan RoanokeVirginia (talk) 23:11, 27 August 2021 (UTC)
Other ideas:
  1. The idea of revisiting the bot's basic tuning on its dwell times, throttling and other patches now that stability has been improved deserves consideration. If the bot could run just a touch faster at every step, the overall load would be reduced. Abductive (reasoning) 00:46, 28 August 2021 (UTC)
  2. Set aside one of the four channels for single use. Abductive (reasoning) 00:46, 28 August 2021 (UTC)
    @Abductive: setting aside one channel for single use wouldn't help much, because in addition to huge batch jobs, Abductive has been simultaneously being flooding the bot with masses of individual page requests. See e.g. this set of 1,500 bot edits. By using my browser's Ctl-F to search in the page, I find that Abductive | #UCB_webform matches 370 edits (a batch job of 2200 pages), and a search for Abductive | #UCB_toolbar matches a further 114 pages.
    So basically, Abductive has been WP:GAMING the limits by occupying one of the bot's 4 channels with a batch job, and then monopolising another channel with some sort of systematic flood of single jobs. (This also happened on at least one pervious day this week).
    Note those 114 single use edits (the toolbar edits) are only the pages which were actually edited. It is likely that there were other toolbar requests which did not lead to the bot making an edit.
    Note also that the 114 toolbar edits spanned a period from 00:58, 28 August 2021 to 08:36, 28 August 2021. That is after @Headbomb's warning[43] at 19:34 27 August about WP:DE and WP:IDHT behaviour ... and it is 18 days after @AManWithNoPlan cut the limit on category batches twice in response to Abductive's abuse of the higher limit.
    It is also 28 days since I rejected Abductive's offer to be a meatpuppet for me (see User talk:Abductive/Archive_21#Talk_page_usage) by running some of my bareURL batches. That would have amounted to my jobs effectively taking two of the bot's four channels, which would be WP:GAMING the bot's scheduler and impeding other editor's access to the bot.
    This looks to me like intentional WP:DE, which will not be resolved by anything along the lines @RoanokeVirginia's thoughtful proposals. The only solution I can see here is some sort of restraint on Abductive's use of Citation bot. --BrownHairedGirl (talk) • (contribs) 18:19, 28 August 2021 (UTC)
  3. Request a Citation bot 2. I think this is something any one of us can do, correct? Abductive (reasoning) 00:46, 28 August 2021 (UTC)
Thanks for the thoughtful suggestions, @RoanokeVirginia.
  1. A capacity increase would be very welcome, but it is very unlikely that any remotely feasible capacity increase could remove the need to use that capacity efficiently and effectively. So Abductive's conduct would still be a problem
  2. a statement on effective use of the bot sounds like a good idea, but I don't expect that a nudge would have any impact on an editor who has repeatedly rejected nudges.
Having been using AWB heavily for 15 years, and run several bot tasks, I am used to the fact that big jobs usually require a lot of preparation if they are to be done efficiently and accurately. So doing prep work and test runs before feeding a big job to Citation bot is second nature to me, and probably also to Headbomb.
By contrast, Abductive appears to want to A) continually set the bot off on big runs with no clear purpose or selection criteria, just because they can, and B) objects to any prep work. The problem is the combination of A and B. Many other editors use Citation bot without extensive prep work, but they usually do so for short bursts or for bigger jobs targeted on a particular topic area. The problem with Abductive's jobs is that they lock up one of the bot's job slots near permanently, often with very low return, for no apparent reason. --BrownHairedGirl (talk) • (contribs) 15:35, 28 August 2021 (UTC)
I don't object to prep work, I'm just pointing out that it is inefficient. If, as you say above, you have tens to hundreds of thousands of bare url articles to run by the bot, shouldn't you be accommodated somehow? I have been running 2200 of your bare urls at a time when it looks like the load on the bot is low, and holding off on runs when it looks like the load is high. Abductive (reasoning) 18:40, 28 August 2021 (UTC)
@Abductive: I have not supplied you with a list of bare URLs, so you could not have have been running 2200 of your [i.e. BHG's] bare urls. You rejected my suggestion (at User talk:Abductive#A_citation_bot_job) that you tackle the pages which transclude {{Bare URL inline}}, and its transclusion count has dropped by only 84 in the last 25 days, so I doubt you were tackling that.
But my main point here is that you have been WP:GAMING the system by flooding the bot with hundreds of toolbar requests while also running a big batch job, so your holding off claim seem to me to be bogus. --BrownHairedGirl (talk) • (contribs) 19:07, 28 August 2021 (UTC)
As a matter of fact I have been running from {{Bare URL inline}}, and nobody is gaming the system. Using the bot for its intended purposes is legitimate. Abductive (reasoning) 19:14, 28 August 2021 (UTC)
@Abductive: Gaming the bot's limits is not legitimate.
And I see no evidence of your claim that you have been targeting {{Bare URL inline}}. Can you provide any evidence of that claim? --BrownHairedGirl (talk) • (contribs) 19:21, 28 August 2021 (UTC)
I drew a list from Category:CS1 errors: bare URL and have been running them 2200 at a time. Just now I requested a run of 2200 drawn directly from {{Bare URL inline}}. See if that brings down your metric. Don't forget that the bot isn't very good at fixing bare urls. Abductive (reasoning) 20:10, 28 August 2021 (UTC)
@Abductive: the bot's latest list of contribs shows a batch of 220 page submitted by you.
Your use of a webform rather than links from a page impedes scrutiny, because only the pages which have actually been edited are visible to other editors. However, checking the recent bot contribs for the pages which have been edited, the list appears to consist partly of pages which transcluded {{Bare URL inline}}, e.g. [44],[45],[46] ... but also some pages which appear to have never transcluded {{Bare URL inline}}, e.g. COVID-19 vaccination in Malaysia (see bot edit [47], and the unsuccessful WikiBlame search for "Bare URL inline"[48]).
So your current batch is a mix of {{Bare URL inline}} and something else ... and the fact that you have started that batch now does not alter the fact that the evidence so far indicates that your claim to have been processing batches of 2200 {{Bare URL inline}} pages is false. My bare URL cleanup jobs average about 30% of pages being completed cleared, and the first 100 of your latest batch shows 46/100 having bare URLs replaced. So if you had previously run batches of 2200 {{Bare URL inline}} pages, we should have seen multiple instances of 700 pages being cleared. That has not happened.
And you have still not addressed the issue of how you have been WP:GAMING the bot's limits by stashing up hundreds of toolbar pages whilst running a big batch. --BrownHairedGirl (talk) • (contribs) 20:53, 28 August 2021 (UTC)
I have been running the bot on thousands of members of the bare url category, and just now on the list of transclusions of {{Bare URL inline}}. Why there should be a difference in its ability to clear one more than the other I cannot say. Abductive (reasoning) 21:04, 28 August 2021 (UTC)
@Abductive: Thank you for finally confirming that your claims to to have previously processed batches of 2200 pages transcluding {{Bare URL inline}} were false.
Making repeated false assertions is not collaborative conduct. --BrownHairedGirl (talk) • (contribs) 21:12, 28 August 2021 (UTC)
I have been running the bot on thousands of bare urls, for you. I did not deliberately make a false statement. Abductive (reasoning) 21:16, 28 August 2021 (UTC)
@Abductive: So the evidence I can see indicates that your statements are false.
I have not provided you with any lists of articles with bare URLs. You have been processing pages in Category:CS1 errors: bare URL, which is a related-but-different issue which you chose. And in the last few hours, you started processing pages one batch of 2200 pages, some of which transclude {{Bare URL inline}} ... but until this evening, I see no other such batch.
I have asked you for evidence to support your claims, and you have provided none. If you wish to sustain your assertion that you have previously submitted batch jobs of pages which transclude {{Bare URL inline}}, please provide some diffs of the bot's edits in some batches, or better still some contribs lists which show multiple bot edits on those batches. --BrownHairedGirl (talk) • (contribs) 21:42, 28 August 2021 (UTC)
Nobody is misusing the bot. Requesting a category run is a function of the bot. Requesting individual runs of articles is a major purpose of the bot, and it should be noted that if all four channels are in use, nobody can request the bot to run on an article they just created. Just about any random category run these days will only achieve 30%; so what does it matter if it a big or small category? You are running the bot on a collection of articles in which you are achieving only 30% of your goal of fixing bare urls. But you are not misusing the bot either, or gaming the system by running as close to 2200 as you can each time. Abductive (reasoning) 21:04, 28 August 2021 (UTC)
@Abductive: it is hard to assess whether you misunderstand he issue being discussed, or whether you are being intentionally evasive.
Indeed, Requesting individual runs of articles is a major purpose of the bot ... but your rapid requests of hundreds of individual articles within a few hours while you are already running a max-size batch is in effect running a second batch, and thereby WP:GAMING the bot's limits. If that is not clear to you, then we have a big problem.
Your assertion that Just about any random category run these days will only achieve 30% is demonstrably false in two respects:
  1. this thread has plenty of evidence of you feeding random categories to the bot, and getting edit rates of only ~10%, including one category with only a 7% edit rate.
  2. you are conflating two difft percentages. My bot runs of pages with bare URLs usually clears on a first pass all the bare URLs on about 30% of the pages. However, on other pages in the set, it removes some of the bare URLs and/or makes other changes. So the total edit rate is usually well over 50%.
This is beyond ridiculous. --BrownHairedGirl (talk) • (contribs) 21:32, 28 August 2021 (UTC)
The fact that your bare urls get above 50% is one of the reasons I am assisting you in running them 2200 at a time. Individual requests fix errors, and do more to hold open a channel (from runs) than to clog a channel. Occassional low productivity runs on categories is a matter of bad luck. My runs are in no way impeding your efforts, and in fact are helping them, so you should not be at all concerned about my use of the bot. Abductive (reasoning) 21:42, 28 August 2021 (UTC)
@Abductive: This is ridiculous.
  1. No, you are not running my bare URL jobs 2200 at a time. A I explained above, you are not running em at all: Category:CS1 errors: bare URL is a separate issue.
  2. . But hundreds of rapid individual jobs do clog a channel, as surely as if they were batch. They are in effect a second channel.
  3. My runs are in no way impeding your efforts. Again, nonsense: your routine use of two channels slows down the bot, and impedes other editors from even starting new jobs.
  4. so you should not be at all concerned about my use of the bot. This is a severe case of WP:IDHT.
I fear that this may have to be escalated. --BrownHairedGirl (talk) • (contribs) 22:00, 28 August 2021 (UTC)
Don't you think your efforts should be directed towards improving this and other bots, and not at a misguided effort to get me to stop using the bot "wrong"? Abductive (reasoning) 22:05, 28 August 2021 (UTC)
I would dearly love to be able to focus on my work, rather than having to divert my efforts into trying dissuade one WP:IDHT editor from persistently disrupting my work and other's work by systematically abusing this bot and repeatedly making misleading and/or false assertions when i attempt to discuss the problem. --BrownHairedGirl (talk) • (contribs) 22:55, 28 August 2021 (UTC)
I never deliberately made false assertions, and I am not abusing the bot or disrupting anything. I am using the bot as intended, and doing a pretty good job of it. Abductive (reasoning) 23:04, 28 August 2021 (UTC)
If your false assertions were unintentional, then then I fear that you may not comprehend the issues adequately.
The bot has twice had to be reconfigured to prevent your abuse, so you are demonstrably not using it as intended,
And if you really think that repeatedly wasting the bot's time by feeding it huge categories which need almost no bot work is what the bot was intended for, then you have a very odd idea of what bots are intended for. --BrownHairedGirl (talk) • (contribs) 23:12, 28 August 2021 (UTC)
Reconfigured to make the bot use its 4 channels more effectively. And a review of the totality of my requests will show that I am getting above average results. Abductive (reasoning) 23:40, 28 August 2021 (UTC)
No, you have not Reconfigured to make the bot use its 4 channels more effectively. All you have done is to selfishlyand disruptively grab 2 of the 4 channels for yourself. This makes the bot work no faster, and use its time no more effectively; your WP:GAMING of the limits just gives you a bigger slice of the bot's time and denies other editors access to the bot.
You offer no evidence at all of the claimed effectiveness of your request; that is just an empty boast. It is not possible to assess that without access to the bot's logs, because the bot's contribs lists provide no way of telling whether any of your single requests led to no edit. For all we can see from the contribs list, it may well be that the edit rate for your single requests is as low as the 7% you got from a category. --BrownHairedGirl (talk) • (contribs) 04:20, 29 August 2021 (UTC)
At no point am I selfishly or disruptively grabbing channels. First off, that's not even possible, and second, the bot is improving the encylopedia when it makes corrections. Additionally, they often happen to be bare url articles, a project of yours. How is that selfish? Nobody is calling you selfish for running thousands of articles past the bot. Abductive (reasoning) 06:00, 29 August 2021 (UTC)
You are cherry-picking evidence. Why not mention my recent category run that got lucky and found and corrected at a 70% rate? And to suggest that a handful of individual requests makes it immpossible to assess my overall rate is silly, it is easy to assess my overall rate using just the category and multiple runs. But why are you even interested in my activities? As I said before they are not impeding your runs, and in fact I am running your bare url finds. Right now, for instance, I am not running anything, because the bot seems to be heavily loaded. Abductive (reasoning) 06:00, 29 August 2021 (UTC)

Please find another place to argue. Thanks. --Izno (talk) 16:32, 29 August 2021 (UTC)

@Izno: surely this is the appropriate place to address systematic misuse of the bot? --BrownHairedGirl (talk) • (contribs) 17:31, 29 August 2021 (UTC)
You said it yourself, you may need to escalate. I am suggesting you do so, because it seems clear to me that you will not be able to persuade Abductive here. Izno (talk) 18:08, 29 August 2021 (UTC)
Fair enough, @Izno. When it recurs, I will escalate. --BrownHairedGirl (talk) • (contribs) 20:11, 29 August 2021 (UTC)

IMDB website in 1963

In this edit https://en.wikipedia.org/w/index.php?title=Doctor_Who&type=revision&diff=1057375628&oldid=1057016709 the bot added |date=21December 1963. To the best of my knowledge, the IMDB was not active in 1963. Obviously the bot is confusing the date of the TV show vs the date the entry was added to IMDB. It made the same addition a week earlier, which I reverted. Is there an easy way to make the bot avoid doing this?  Stepho  talk  11:47, 27 November 2021 (UTC)

added to NO_DATE_WEBSITES array. AManWithNoPlan (talk) 13:10, 27 November 2021 (UTC)
will be {{fixed}} once code is deployed. AManWithNoPlan (talk) 13:11, 27 November 2021 (UTC)

Author Clean up

Status
enhancement request? {{wontfix}}
Reported by
John Maynard Friedman (talk) 12:02, 28 November 2021 (UTC)
What happens
author names of the form last1=Alpha, Bravo are not expanded
What should happen
deconstruct to last1=Alpha | first1=Bravo
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=At_sign&curid=710197&diff=1057535596&oldid=1054679806&diffmode=source
We can't proceed until
Feedback from maintainers


Ignoring the fact that the bot edit was purely cosmetic, could it actually have done something useful? Here is the (updated) citation:

  • Chai, Yan; Guo, Ting; Jin, Changming; Haufler, Robert E.; Chibante, L. P. Felipe; Fure, Jan; Wang, Lihong; Alford, J. Michael; Smalley, Richard E. (1991). "Fullerenes wlth Metals Inside". Journal of Physical Chemistry. 95 (20): 7564–7568. doi:10.1021/j100173a002.

This might usefully have been changed to last1=Chai |first1= Yan | last2=Guo | first2= Ting etc.

Is this (a) easy to implement and (b) worth the effort? --John Maynard Friedman (talk) 12:02, 28 November 2021 (UTC)

The bot edit[49] was not purely cosmetic. It added a date and modified and ISBN. --BrownHairedGirl (talk) • (contribs) 17:13, 28 November 2021 (UTC)
Commas may originate from organization names, hence citation bot should avoid modifying them. --Izno (talk) 19:07, 28 November 2021 (UTC)
In general, whatever's in the author/last/first is often completely garbled up. It would be really nice to have tracking categories for this, but any author-related cleanup should be done semi-automatically if anything. Headbomb {t · c · p · b} 19:40, 28 November 2021 (UTC)
We have maintenance categories for > 1 commas and > 0 semicolons for each of the authorship parameters, ex Category:CS1 maint: multiple names: authors list‎. Izno (talk) 20:10, 28 November 2021 (UTC)
I would love to see the Bot do this, but first of all the number of people made angry would be insane to behold and second of all the coding and exceptions would be very hard (impossible?) to do. AManWithNoPlan (talk) 22:39, 28 November 2021 (UTC)

Ballotopedia

Changed web to news and newspaper at revision 1058214656 by Citation bot (talk) at Valencia, California Fettlemap (talk) 04:59, 2 December 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 14:53, 3 December 2021 (UTC)

Removal of access indicators

I'm tired of having to undo edits like this, which stealthily remove paywall and related access indicators. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:34, 2 December 2021 (UTC)

{{fixed}} over a year ago. AManWithNoPlan (talk) 23:39, 2 December 2021 (UTC)
Do you mean that such edits are no longer made, are made differently, or now display the indicator? Because until recently no indicator was displayed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:26, 3 December 2021 (UTC)

Adds magazine=[[Billboard (magazine)]]

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 07:24, 3 December 2021 (UTC)
What happens
[50]
What should happen
magazine=[[Billboard (magazine)|Billboard]]
We can't proceed until
Feedback from maintainers


Affects many magazines, like Vanity Fair (magazine), Billboard (magazine), Songlines (magazine), and newspapers like Dawn (newspaper), The Referee (newspaper), etc... If other dabs are used (like journal=[[Foobar (journal)]], those too should be updated. Headbomb {t · c · p · b} 07:24, 3 December 2021 (UTC)

Bot makes cosmetic edits solely to bypass citation template redirects

Status
{{fixed}}
Reported by
BrownHairedGirl (talk) • (contribs) 22:53, 29 September 2021 (UTC)
What happens
bot makes edits solely to bypass a redirect to a CS1/CS2 template. This makes absolutely zero difference to how the page is rendered.
What should happen
In the example below, nothing should happen. Per WP:NOTBROKEN and WP:COSMETICBOT, such changes should be made only if they are accompanied by a non-cosmetic change.
Relevant diffs/links
[51], [52], [53], [54]
We can't proceed until
Feedback from maintainers


Ping @AManWithNoPlan: please can you try to fix this? --BrownHairedGirl (talk) • (contribs)

I will look into it. AManWithNoPlan (talk) 18:56, 23 November 2021 (UTC)
Some of changes are needed for other tools to work on the citation templates. Are there some particular changes that you think are the most problematic. AManWithNoPlan (talk) 21:48, 23 November 2021 (UTC)
@AManWithNoPlan: sadly, there are quite a lot of problematic edits, in several varieties, e.g.:
  • [55], [56], [57], [58], [59], [60] -- in each case changed two spaces in "cite web" to one space.
  • [61] -- changed two spaces in "cite news" to one space.
Those edits are all from the first 34 edits in the current contribs list[62] and all from the same batch suggested by you. Did you actually select a whole run of 280 pages based on that cosmetic fix? BrownHairedGirl (talk) • (contribs) 23:16, 23 November 2021 (UTC)
That's some new code that I am testing out. The multiple space code is new, I thought the bot would detect as a minor edit, but I was wrong. AManWithNoPlan (talk) 23:23, 23 November 2021 (UTC)
My question is that many of these are really typo catching ones and some are minor typos. Which ones are which. The extra spaces break basically all the various citation modifying bots. AManWithNoPlan (talk) 23:26, 23 November 2021 (UTC)
Really?
Allowing for multiple spaces in template titles (e.g. \{\{\s*cite\s+web) is for me a basic piece of encoding that I do even in short AWB runs. Are there really bots which are sophisticated enough to be allowed to modify citations, but so crude that they don't handle multiple spaces?
The multiple space thing is a good tweak as an accompaniment to a major edit, but like AWB's WP:GENFIXES (which includes the multiple space fix), it shouldn't be standalone. Same with canonicalising template names, which is also part of WP:GENFIXES for some templates; if the cite family of templates are not on that list, they should be added. BrownHairedGirl (talk) • (contribs) 00:23, 24 November 2021 (UTC)
That helps a lot. Should no longer due the minor edits based upon spaces like that (unless other edits are done too). AManWithNoPlan (talk) 13:29, 24 November 2021 (UTC)
@AManWithNoPlan: I am glad that helped. And well done getting those minor edits not done as a standalone task. It would be great if parameter name changes could also be minor, and not done as standalone edits.
(BTW, sorry that there was a typo in my regex. I had written \{\{\a*cite\s+web) but that should of course have been \{\{\s*cite\s+web)
I have been looking into AWB's handling of template redirects. WP:AutoWikiBrowser/General fixes#Template_Redirects_(TemplateRedirects) says that AWB uses the WP:AutoWikiBrowser/Template redirects rule page, per WP:Redirect#Template redirects.
So I looked at Wikipedia:AutoWikiBrowser/Template_redirects#Citation_templates, and none of the CS1/CS2 templates are listed.
I think that they should all be included. What do you think? I'd be happy to list them If you like the idea, I would be happy to do the leg work of proposing it at WT:AutoWikiBrowser/Template redirects. Also pinging @Headbomb and Trappist the monk:, hoping for their thoughts. BrownHairedGirl (talk) • (contribs) 11:09, 25 November 2021 (UTC)
https://github.com/ms609/citation-bot/blob/master/Template.php See the massive array starting with $fix_it. Extra spaces is a different bit of code though. AManWithNoPlan (talk) 15:46, 27 November 2021 (UTC)

NPR may be news, but is not a newspaper

Status
{{fixed}}
Reported by
— JohnFromPinckney (talk / edits) 05:45, 23 November 2021 (UTC)
What happens
Bot changes citation pointing to www.npr.org from cite web to cite news (okay, I guess; that's what NPR does) but changes |via=NPR to |newspaper=NPR
What should happen
I'm not brave enough to wade through all the aliases, but the bot should either retain cite web and change |via=NPR to |website=NPR, or change to cite news but use something more appropriate like |work=NPR.
Relevant diffs/links
Special:Diff/1056715334
We can't proceed until
Feedback from maintainers


Bot won't fill YouTube links

The bot won't fill refs to YouTube.com. This is a nuisance, because in the 20211120 database dump, there are 10,060 articles with bare links to YouTube. That is about 4% of all remaining pages with WP:Bare URLs, so filling them would make a big dent in the backlog.

I tested the bot on the following pages with bare LINK refs to YouTube.com (from this search): Chris Daughtry, Petrol engine, James Mason, Model–view–controller, CBBC (TV channel), House of Gucci, Luke Combs, Flute, Josh Peck, Bloodhound Gang, and Pauly Shore.

So far as I can see from bot's output, the zotero returns no info for YouTube links, which is obviously outside the bot's control. However, I wonder if it would be possible for the bot to do a direct lookup on those pages? Even if the bot just filled the cite param |title= with the data from the YouTube page's <meta name="title", that would be a useful step forward. BrownHairedGirl (talk) • (contribs) 19:45, 26 November 2021 (UTC)

You gotta pay to use the API: https://developers.google.com/youtube/v3/getting-started and page redirects can be a problem. I have not really looked into it. AManWithNoPlan (talk) 01:25, 27 November 2021 (UTC)
ReFill tool does it, but you have to do that manually, and it does have bugs. AManWithNoPlan (talk) 01:28, 27 November 2021 (UTC)
We explicitly block youtube because of lower quality results in the past. Seems to be doing much better these days from a couple test runs on the zotero server. AManWithNoPlan (talk) 01:31, 27 November 2021 (UTC)
Yes, @AManWithNoPlan WP:ReFill handles lots of domains which other tools cannot handle, but it can run amok when it encounters a page which redirects to the home page rather than returning a 404 error, so using it requires a lot of tedious manual work undoing the refs which it mangled.
ReferenceExpander is much better, but it strips out {{webarchive}} templates (e.g. [63]) and text which wouldn't be within the cite template.
So we don't have any batch tool to fill Youtube refs, and the single page tools have other problems.
Do you think you could unblock YouTube on a trial basis, to see if it can helpfully tackle this backlog? BrownHairedGirl (talk) • (contribs) 11:28, 27 November 2021 (UTC)
Unblocked once code is deployed. That can take a while with existing jobs running. AManWithNoPlan (talk) 13:13, 27 November 2021 (UTC)
Many thanks, @AManWithNoPlan. If we could get even partial success without a significant error rate, that would be a huge help.
I will leave it for a day or two for the new code to deploy, then try a sample batch of about 500 of the 10,060 pages with bare URL YouTube refs.
Do you perhaps have a list of other blocked URLs? If I had such a list, I could incorporate it into my database scans, and thereby help target the bot at pages with potentially fixable bare URLs. BrownHairedGirl (talk) • (contribs) 15:26, 27 November 2021 (UTC)
https://github.com/ms609/citation-bot/blob/master/constants/bad_data.php see ZOTERO_AVOID_REGEX array. AManWithNoPlan (talk) 15:44, 27 November 2021 (UTC)
Thanks. The list is shorter than I expected, leaving me with only this to incorporate into my sacs: (twitter\.com|google\.com/search|ned\.ipac\.caltech\.edu|pep\-web\.org|ezproxy|arkive\.org|worldcat\.org|kyobobook\.co\.kr|facebook\.com|leighrayment\.com|scholarlycommons\.pacific\.edu\/euler\-works|miar\.ub\.edu\/issn|britishnewspaperarchive\.co\.uk|pressreader\.com|ebooks\.adelaide\.edu\.au). BrownHairedGirl (talk) • (contribs) 18:36, 27 November 2021 (UTC)
Hi @BrownHairedGirl: @AManWithNoPlan: Many but not all of the bare refs in your regex are dead youtube videos, which is probably why Reflinks messes up. See Special:Diff/1057461974 for a merge of my youtube preempt/postrempt fixes with the bare ref fixes. Here is another example: Special:Diff/1057466650 The next time I do a Youtube archive run, I'll remember these links. Also, your regex counts bare refs that are commented out for whatever reason; see Coldplay and Jared Leto articles, resulting in an exaggeration of the count; dont know how much it sways count. Rlink2 (talk) 20:37, 27 November 2021 (UTC)
@Rlink2: I always include commented-out refs in my bare URL searches, because some of the tools I use can't exclude them, and I want consistent results. I think that the numbers are low, but I will do a test run on the database to check.
How are you able to use AWB to add archive links? BrownHairedGirl (talk) • (contribs) 20:58, 27 November 2021 (UTC)
@Rlink2: I just did a database scan for bare URLs to test the effect of ignoring comments:
  • Articles with non-PDF bare links including brackets from database dump 20211120 = 290,854 pages
  • Articles with non-PDF bare links including brackets from database dump 20211120, ignoring comments = 290,210 pages
So only a 0.22% reduction in the number of pages. I reckon that's not worth worrying about. BrownHairedGirl (talk) • (contribs) 00:35, 28 November 2021 (UTC)
Due to the nature of youtube links I was able to fiddle around with the AWB advanced settings to aid with youtube archive links. It's not automatic however; I still need to verify every link manually, its just easier to do with AWB. For adding archive links, AWB is "WB" - Wiki Browser with no "auto". Rlink2 (talk) 21:46, 27 November 2021 (UTC)

{{fixed}} AManWithNoPlan (talk) 15:53, 5 December 2021 (UTC)

CAPS: IUCN Red List of Threatened Species

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:41, 4 December 2021 (UTC)
What should happen
[64]
We can't proceed until
Feedback from maintainers


Caps: AIDS

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:50, 4 December 2021 (UTC)
What happens
1
What should happen
[65]
We can't proceed until
Feedback from maintainers


Caps: TheMarker

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:51, 4 December 2021 (UTC)
What should happen
[66]
We can't proceed until
Feedback from maintainers


Caps: BioScience

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 20:56, 4 December 2021 (UTC)
What should happen
[67]
We can't proceed until
Feedback from maintainers


Only if that's the full name of the journal. Journal of Bioscience shouldn't be capitalized that way. Headbomb {t · c · p · b} 20:56, 4 December 2021 (UTC)

Caps: Algebra i Logika

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:01, 4 December 2021 (UTC)
What should happen
[68]
We can't proceed until
Feedback from maintainers


Caps: BJU International

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:02, 4 December 2021 (UTC)
What should happen
[69]
We can't proceed until
Feedback from maintainers


Caps: CLA Journal

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:07, 4 December 2021 (UTC)
What should happen
[70]
Replication instructions
.
We can't proceed until
Feedback from maintainers


Caps: ELH

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:08, 4 December 2021 (UTC)
What should happen
[71]
We can't proceed until
Feedback from maintainers


Caps: ESC Heart Failure

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:09, 4 December 2021 (UTC)
What should happen
[72]
We can't proceed until
Feedback from maintainers


Caps: Publications Mathématiques de l'IHÉS

Status
{{fixed}}
Reported by
Headbomb {t · c · p · b} 21:14, 4 December 2021 (UTC)
What should happen
[73]
We can't proceed until
Feedback from maintainers


Bot causing "missing last1=" error in Cite journal

Status
{{fixed}}
Reported by
Wotheina (talk) 06:02, 5 December 2021 (UTC)
What happens
Bot added "first1=" to a Cite journal citation, causing error "first1= missing last1= ".
What should happen
Bot edit should not cause template errors. If citation database's author data has only one string, then just don't insert half baked "first1="s, or choose template parameters that do not cause such first-last pairing errors.
Relevant diffs/links
Special:Diff/1051019665's last (3rd) edit
We can't proceed until
Feedback from maintainers


Since the author of the source I cited is an organization (Japanese 大阪市立衛生試験所), not a person, I avoided first-last pairing parameters and put it in "editor=", so this bot edit was unnecessary to begin with. But if the bot still wants to fill parameters in the "author=" class, then it should at least take care of easily imaginable and most probable cases such as single words (organizations), over three words, languages with no spacings, "anonymous" in various abbreviations or non-English languages, mutiple persons in one line, etc.. Note that I am not asking the bot to determine whether the string is a person or a non-person, which should be impossible. I'm just asking that if the bot can't fill both parts of a first-last pair, then to stop. Wotheina (talk) 06:02, 5 December 2021 (UTC)

Not making any judgment about what's right, or guessing why the bot suggested that output, but I believe the CS1 templates are happy enough if they have a |last= value, even when there's no |first= value. Vice-versa, as you've observed, doesn't work.
Now then, above you say the author of the source I cited is an organization so you entered it as an editor. I don't like using organisations or "Staff" for author parameters, but if you've got an author, say it's the author. If what you have is an editor, use the editor parameters. Don't try to outsmart the templates, because it messes up the metadata. — JohnFromPinckney (talk / edits) 08:06, 5 December 2021 (UTC)
Retrieved from "https://en.wikipedia.org/w/index.php?title=User_talk:Citation_bot/Archive_28&oldid=1073328128"