User talk:Citation bot/Archive 4

Archive 1 Archive 2 Archive 3 Archive 4 Archive 5 Archive 6 Archive 10

Request for examples and test cases

I'm continuing work on author-handling. Having good examples to work from will help me handle tricky and special cases. If you have citations that have been problematic in the past or which you think would make good test cases, please either drop a link to the diff + line number here or copy the to-be-fixed citation to the sandbox I've been using on testwiki: User:Fhocutt (WMF)/Sandbox. Thank you all for the input and suggestions so far, and any resources you can offer here. --Fhocutt (WMF) (talk) 21:42, 15 September 2015 (UTC)

See above for sample bug reports related to author names:
  • author= converted to authors= and author=
  • Bot unnecessarily adding last2, last3, last4, ... parameters
  • Bot added |first1= when |first= was already present
  • Bot found |first9=LH et al. and added |author10=and others and |displayauthors=9
  • Bot used "author4=and others" in place of real author #4 on a 7-author reference
  • duplicated last name
  • Butchered author names
  • Deprecated parameter |author-separator= added
  • |display-authors=9 no longer necessary for exactly nine authors
  • Bot creates CS1 errors when attempting to parse authors= parameter containing many names
  • Remove display-authors=etal when inserting all the remaining authors
Let us know if you need additional feedback and testing. – Jonesey95 (talk) 18:36, 16 September 2015 (UTC)

Thank you! I've added the examples above to my testwiki sandbox.

Please test the tool now. It should not modify authors when author name-related parameters exist, including the new vauthors. However, it should fetch and expand author data when available if there are no existing parameters. You can help by reporting bugs here or at https://phabricator.wikimedia.org/T111891.

Known issues:

  • Will still modify editors, regardless of whether editor name parameters are present. Does this need to be fixed for the tool/bot to be used?

It should convert curved quotes to "'" in fetched author data, but I don't have any references to serve as a test case for this. If you do, please leave them here or in my testwiki sandbox. --Fhocutt (WMF) (talk) 01:03, 18 September 2015 (UTC)


This is a good candidate for starting to add automated tests to the bot's codebase. You can help by commenting here or on the Phabricator task with examples of citations with strange formatting and edge cases--spaces in strange places, multiline parameters or values, and similar. The idea here is to have a better way to make sure that the bot continues to parse template parameters and values correctly, even when changes are made to the code. Your help is appreciated. --Fhocutt (WMF) (talk) 03:41, 3 October 2015 (UTC)

You can start with some of the bug reports on this page:
  • Bot should add more than four editors
  • Issue & Number
  • Comments cause trouble
  • Bot 579 added doi-broken-date when doi-inactive-date was already present
  • |display-authors=9 no longer necessary for exactly nine authors
  • Removes accessdate for no-URL citations inside of nowiki tags
  • Edit of talk page
  • Hyphens to dashes problem
  • Duplicating jstor
  • Citation unrendered because of syntax error
Have fun! – Jonesey95 (talk) 13:37, 3 October 2015 (UTC)
Thanks! Most of those weren't touched by this part of the code, but I added a couple of them as examples to make sure the part I was modifying didn't change them. On the strange duplicate parameter issue when comments are present, the current version of the bot doesn't do that, at least on testwiki: https://test.wikipedia.org/w/index.php?title=User%3AFhocutt_%28WMF%29%2FCitation_bot_test&type=revision&diff=243602&oldid=243601 . --Fhocutt (WMF) (talk) 22:57, 9 October 2015 (UTC)

{{notabug}}

Doesn't expand cite journal from pmid

Status
Not a bug
Reported by
RoadTrain (talk) 17:06, 31 May 2016 (UTC)
Relevant diffs/links
CYP4F12
Replication instructions
Click 'Citations' and the bot won't expand cite journal only having pmid. It works on other pages I used it on.
We can't proceed until
Agreement on the best solution


That's probably because these refs were inside {{PBB_Summary}} template. Some user already filed them.--RoadTrain (talk) 22:02, 31 May 2016 (UTC)

Strike the probably, it's definitely because of that. Next time you can simply remove the opening and closing {{}} of the template, let the bot run, and put them back before saving (I tried exactly that and it works). Imo you can mark the issue as fixed — it would be probably considered a new feature to teach the bot how to work inside templates and they don't accept new feature requests — but it's your choice of course. Ihaveacatonmydesk (talk) 22:16, 31 May 2016 (UTC)
This is a duplicate of the comments bug. This has nothing to do with the {{PBB_Summary}}. Flagging as {{notabug}}, so that bot will archive this duplicate bug AManWithNoPlan (talk) 15:13, 21 July 2016 (UTC)

Bot unnecessarily adding last2, last3, last4, ... parameters

Status
unresolved ongoing bug
Reported by
Boghog (talk) 19:48, 25 October 2014 (UTC)
Type of bug
Deleterious
What happens
When the full author list is stored in |author=, the bot adds |last2=, |last3=, |last4=, ... without the corresponding |first2=, |first3=, |first4=, ...
What should happen
If |author= contains the full author list, then the bot should not add |last2=, |last3=, |last4=, ... parameters
Relevant diffs/links
diff, diff, diff, diff, diff, diff, ...
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer
If |author= contains a complete author list, do not unnecessarily add |last2=, |last3=, |last4=, ...


Extended content

This is essentially the same bug that was previously reported here but it still occurring. Boghog (talk) 19:48, 25 October 2014 (UTC)

 Not a bug. Or perhaps, only to the extent that you consider the "lastn" (etc.) parameters to be errors.
 There have been several discussions on stuffing "full author lists" into the [c]author[s] parameters (e.g.: Module_talk:Citation/CS1/Archive_9#Coauthors_2). That is certainly a dubious practice, and perhaps it is time to deprecate it. But this is not the place for that. ~ J. Johnson (JJ) (talk) 20:19, 25 October 2014 (UTC)
Absolutely a bug. It is redundant and unnecessary to add lastn parameters to these citations. Even if one wanted such parameters, the bot has done this incorrectly by not also adding the corresponding firstn parameters and removing the author parameter. Furthermore this "correct" behavior would be in clear violation of WP:CITEVAR. The use of a single author parameter to store full Vancouver formatted author lists is widely used and has long been accepted. The bot should not make changes to citations that were not asked for. What is dubious practice is firstn, lastn nonsense that clutters up Wikitext to generate meta data that no one uses or should use. Wikipedia is not a reliable source and this extends to citations. Boghog (talk) 21:05, 25 October 2014 (UTC)
There is no redundancy in separating a list of authors into individual authors, or splitting an author's last name (which is the basis for alphabetizing) from the rest. And if you think that the "lastn" and "firstn" parameters are "nonsense that clutters up Wikitext" you probably aren't happy using citation templates in the first place, so perhaps you should just use straight text, manually formatted. Except, of course, where use of templates has already been established. ~ J. Johnson (JJ) (talk) 22:13, 25 October 2014 (UTC)
Please look carefully at the diffs above. What citation bot has done is add lastn parameters and left the existing author parameter in place. As a consequence, author last names are now listed twice, once in the author parameter and again in lastn parameters. That is redundancy. You probably aren't happy using citation templates in the first place – nothing could be further from the truth. I quite often convert non-templated citations to {{cite journal}} templated citations (see this diff, there are thousands of similar examples in my edit history). Furthermore I maintain this template filling tool that generates {{cite journal}} formatted citations. Spliting author data between firstn, lastn parameters is an excessive level of data granularity that becomes unwieldy if there are a large number of authors. The Vancouver system provides a compact comma delimited list that unambiguously defines authors' first and last names. Boghog (talk) 05:40, 26 October 2014 (UTC)
Any use of any of the citation templates in the wikitext — i.e., within <ref>...</ref> tags — introduces clutter, so it is inconsistent for you object to "clutter" only in regard of author parameters. (As a side note: I find all bibliographic details clutter the text, which is why I put them into a separate section.) If your complaint is that, after adding "lastn" parameters, the bot failed to remove the corresponding "author" parameters, then I would concur that is a bug. But that is just what you are asking for: to retain the questionable "author" parameters. As to splitting the author names: that is not a bug, that is the intended result. ~ J. Johnson (JJ) (talk) 21:41, 26 October 2014 (UTC)
Good, we now both agree that there is a bug, but your solution is a clear violation of WP:CITEVAR. Storing full author lists in a single author parameter has not been deprecated and you have not explained why this use is questionable. Quite to the contrary, the use of "firstn, lastn" parameters becomes ridiculous if there are a large number of authors. Furthermore it is completely unnecessary if the author list follows the Vancouver system. The reason to use templates is so that the data can easily be parsed to provide a consistent rendering of the citations, wiki links, maintained by bots, etc. The Vancouver system author format can easily be parsed without the need for verbose firstn, lastn parameters. It just that the maintainers of Module:Citation/CS1 have resisted doing so. ({{vcite2 journal}} provides a possible way forward if functionality were added to this template to parse the author parameter to internally generate firstn, lastn parameters). I also occasionally use list defined references, but some editors object to these because it splits the text from the supporting sources. Regardless of whether these lists are a good idea or not, most articles don't use these. Finally, templates should be concise containing no more overhead than is necessary to do its job. I see the value of separate parameters for title, journal name, date, etc. but splitting author data into firstn, lastn parameters is an excessive and unnecessary degree of data granularity. So I disagree that I am being inconsistent. Boghog (talk) 03:35, 27 October 2014 (UTC)
Neither is deprecated. The right behaviour is to standardize on the predominant type, or failing that, leave the existing form. The bot is doing neither, but that is not the egregious part. The bot is populating the same author in both ways, creating the appearance of two authors of the same name! LeadSongDog come howl! 03:48, 27 October 2014 (UTC)
This is ridiculous, the bot needs to be halted until this issue is fixed. The bot shouldn't add lastn, firstn, author, authors and similar unless none of the above is specified already, that's it, I don't get why it's even a matter of discussion. Ihaveacatonmydesk (talk) 17:16, 27 October 2014 (UTC)
No, we don't agree. Or rather: I will agree there is a bug if you agree that it is in retention of misused "author" parameters. But obviously you don't. If you want to argue about apppropriate "data granularity" or "clutter", fine, but those aren't bugs, so this is the wrong place. ~ J. Johnson (JJ) (talk) 19:25, 27 October 2014 (UTC)
But yes, we do agree :-). We just need to replace |author= with |vauthors= and convince the maintainers of Module:Citation/CS1 to parse the later to internally create firstn, lastn parameters. Agreed? A brilliant idea that should make everyone happy. With this solution, we can reduce the clutter while still generating clean metadata and fully supporting |authorlinkn=, |display-authors=, etc. We can also introduce error checking to make sure the content of |vauthors= is compliant with the Vancouver system. Boghog (talk) 20:43, 27 October 2014 (UTC)
Sorry, still no. The core issue is "data granularity" (as you call it), and particularly whether multiple authors ("author lists") should be allowed in a single parameter. (And possibly including whether authors' names should be split into first/last.) Whether the parameter involved is |author=, |authors=, |coauthor=, |coauthors=, |vauthors=, or any other parameter, is immaterial. ~ J. Johnson (JJ) (talk) 21:32, 27 October 2014 (UTC)
It appears that you have taken the position that is self evident that that authors names must be split into different parameters with out providing a shred of evidence that this is true. The only rational reason for maintaining such a position is that is essential for parsing and error checking the data, and as I have explained above, neither is true. The Vancouver system provides an unambiguous method for parsing author data. When the data is formatted in this style, explicit firstn, lastn parameters become superfluous. These parameters can be generated internally on-the-fly. Boghog (talk) 22:00, 27 October 2014 (UTC)
No, but the issue of whether to split or not is deep enough it should be split off from the specifics of this bot's behavior. ~ J. Johnson (JJ) (talk) 20:51, 30 October 2014 (UTC)
You still have not explained why splitting is essentiall, but I would agree this is not the place to have this discussion. Boghog (talk) 07:51, 31 October 2014 (UTC)
@Boghog: "Just parse it automatically" is a bad idea. See http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ for why. —David Eppstein (talk) 21:50, 27 October 2014 (UTC)
I totally agree. That is precisely the reason for proposing |vauthors= so that this type of parsing is only done when this specific parameter is specified. Boghog (talk) 22:06, 27 October 2014 (UTC)
Another essential part of the proposal is error checking. The string would only be parsed if it conforms to the Vancouver system. If it does not conform, an error is thrown. Boghog (talk) 22:37, 27 October 2014 (UTC)
It's not a matter of agreeing or not agreeing: the page is fine here https://en.wikipedia.org/w/index.php?title=Wishful_thinking&oldid=584965755#References, then the bot intervenes and the templates are throwing errors https://en.wikipedia.org/w/index.php?title=Wishful_thinking&oldid=584966975#References. So it is a bug. Ihaveacatonmydesk (talk) 20:32, 27 October 2014 (UTC)
edit: I took that as an example but the examples boghog linked are better suited. My example just proves that this bot has a history of messing with authors parameters and the devs just won't (or can't) fix its behavior. Ihaveacatonmydesk (talk) 20:36, 27 October 2014 (UTC)
Thanks Ihaveacatonmydesk. I didn't realize that some of the bot edits resulted in throwing errors. The problem is worse than I thought. This needs to be fixed immediately. Boghog (talk) 20:55, 27 October 2014 (UTC)
That was an old version of the bot, that particular issue might have been fixed. Still, I consider the issues it creates now as dire as those I linked. A bot should never need this amount of babysitting. Ihaveacatonmydesk (talk) 21:04, 27 October 2014 (UTC)
I would agree that this bot has been troublesome, which is why I tend to block it from pages I work on. I would also favor blocking it. ~ J. Johnson (JJ) (talk) 20:55, 30 October 2014 (UTC)

I believe that the bug described here is a duplicate of one described above. I have found that e-mailing the bot's maintainer is more effective than posting here at eliciting a response to requests perceived as urgent. In the meantime, the undo link is always available to you, and there are instructions for blocking the bot from specific articles displayed on the bot's user page. – Jonesey95 (talk) 00:01, 28 October 2014 (UTC)

It's not the bot that's been troublesome, so much as that needed behaviour of the bot keeps being shifted by changes to the template code. That said, the bot hasn't made an edit since 25 October, so there's no panic needed. LeadSongDog come howl! 21:14, 30 October 2014 (UTC)
Even if the addition of lastn was not a bug (and it clearly is), firstn should be added. (And it's still happening.) — Arthur Rubin (talk) 10:49, 26 November 2014 (UTC)
Hello! Anyone there??? Boghog (talk) 20:43, 12 December 2014 (UTC)

Workaround based on {{vcite2 journal}}

As a follow-up to the above discussion, a new {{vcite2 journal}} template with an optional |vauthors= parameter has been recently created. This close variant of {{cite journal}} supports assignment of multiple authors in Vancouver format (a comma separated list containing no semi colons or periods) to a single |vauthors= parameter that generates clean author metadata. In all other respects, {{vcite2 journal}} is identical to {{cite journal}}. Hence I would request that instead of adding last2, last3, last4, ... parameters to citations with Vancouver style author format that the bot instead replace {{cite journal | author}} with {{vcite2 journal | vauthors}}. Boghog (talk) 16:02, 6 January 2015 (UTC)

Since support for |vauthors= has now been added to all Citation Style 1 templates, it is no longer necessary to use {{vcite2 journal}}. Hence it should only be necessary to replace the |author= parameter name with |vauthors=. Boghog (talk) 08:01, 5 September 2015 (UTC)

Handling multiple authors

Extended content

@Boghog, Materialscientist, and Ryan Kaldari (WMF): I've been looking into the way the bot handles and expands multiple authors. The main issues seem to come from an odd choice to reassign several parameters (including authors and coauthor(s)) to author2, which I have temporarily fixed. There are also some hiccups when expanding "et al."--for some formattings of author lists, the list of names is not recognized as a list, so it thinks the list is a single author and fetches the rest of the author names because it looks like there are missing parameters.

My questions:

  • Are there any changes which should be made for multiple authors?
  • How should "et al."/"and others" be handled? Should they be left as-is? Should they be expanded when adding authors as new parameters? Should they be expanded from an existing author parameter? Is there any current consensus on this?
  • If there are no cases where changes should be made for multiple authors, I propose a rule of "if a single author-related parameter is already present, it should not be changed and no more author parameters should be added, even if the rest of the citation is expanded". How does this sound? --Fhocutt (WMF) (talk)
If there is any author information already in the citation template, I think it's a minefield for us to try to modify it, especially now that Vancouver-style author lists are part of the mix. Even if we wrote code to cover all the dozens of contingencies, one of them would probably break before the month was out. I like your suggested rule. It seems like the best plan to me. I'm even reluctant for us to support adding author data in the case where no author parameters are currently present, but I guess there's a lot less chance to totally screw up the citation in that case. The worst we can do is add authors in the wrong style for the article, which hopefully humans will not mind fixing too much. Ryan Kaldari (WMF) (talk) 01:34, 15 September 2015 (UTC)
I think that copy/pasting author names can be enough of a pain, particularly for citation styles that do use individual first/last params, that it's a good idea to automatically insert some author details, even if it's not in precisely the right way. How about et al.? I suspect that runs into arguments about how much data/metadata to include in citations, but if having the rest of the authors included automatically is useful then it might make sense for the bot to handle that case. It's a complicated enough one, though (can have all authors et al., first + rest of authors et al., various coauthor params...) that it may be best to just not touch it. Thoughts? --Fhocutt (WMF) (talk) 02:51, 15 September 2015 (UTC)
First do no harm. I agree that it is best not to touch it. Boghog (talk) 03:45, 15 September 2015 (UTC)
I am hopeful that you are aware of |display-authors=etal in the CS1 style. If you find more authors in a particular citation, maybe you should just leave the existing authors alone and then add that, if display-authors isn't already set. I--of course--prefer more metadata to less, so, that's a separate solution. --Izno (talk) 03:48, 15 September 2015 (UTC)
(edit conflict) Using "et al." in an author parameter will put an article in Category:CS1 maint: Explicit use of et al., a maintenance category. The proper way to generate "et al." in a citation is to use the |display-authors= or |display-editors= parameter. See {{Cite_web#Display_options}} for more details.
As for your proposed rule, I would rather see the bot remove the content of a lone author parameter and then fill in all of the authors. Editors could then choose to display as many as they want by adding |display-authors=. If you go with the proposed rule, it would need to be accompanied by a way to force filling in additional author names. For example, with some versions of the Citation Bot, you could often remove the content of one or more parameters to persuade the bot to re-fill those parameters.
On a similar note, see at least one bug report section above about Citation Bot's limit of four editors. I can explain that to you at length in a separate thread if you like; the "Display options" section linked above has a short explanation. – Jonesey95 (talk) 03:51, 15 September 2015 (UTC)
I'd prefer this: fill in all authors if there are none (using the most "stable" format), don't touch authors if anything is already filled in. This way the bot will not generate new errors. Editors willing to refill authors could blank the author fields, and those who prefer other author formats might use Help:Citation_tools or ask to create additional gadgets like those. Materialscientist (talk) 04:04, 15 September 2015 (UTC)
@Materialscientist: How are those editors to be alerted to the fact that other authors exist? Perhaps a hidden comment appended to the existing author field? LeadSongDog come howl! 18:58, 15 September 2015 (UTC)
Just click the doi/pmid/etc. Materialscientist (talk) 22:27, 15 September 2015 (UTC)
@LeadSongDog: The "et al." in one or more of the author parameters, and the presence in the maintenance category, should indicate that there are more authors. --Fhocutt (WMF) (talk) 21:07, 15 September 2015 (UTC)
Well, "just click" does nothing in the edit window, and only comes into play once rendered. Worse, most editors will simply presume the existing citation is already correct. They need some cue to tell them to check "this" one, of all the citations in an article. Where the bot detects a discrepancy between the citation and the crossref/pubmed/etc database, it should drop a hint for humans, since it won't be making the revision itself. LeadSongDog come howl! 03:35, 16 September 2015 (UTC)
The proper place for et al. is not the author or editor name-holding parameters; use |display-authors=etal and or |display-editors=etal.
—Trappist the monk (talk) 21:16, 15 September 2015 (UTC)
I'm aware of this. However, per discussion above this tool is going to leave that for humans or less fragile/better maintained citation tools to fix. --Fhocutt (WMF) (talk) 21:43, 15 September 2015 (UTC)
Yes, good idea. I have cleaned up not a few instances of mushedtogetherauthors and find that there are enough subtleties that I think any parsing of authors should be the specific task of a dedicated, specialized bot. Similarly for the presence of explicit "et al.": that likely requires additional information beyond what is immediately present, which is probably more of stretch than Citation bot should attempt. ~ J. Johnson (JJ) (talk) 19:21, 16 September 2015 (UTC)
Comparison of author parameters
Feature |lastn=,
|firstn=
|vauthors= |authors=
Clean author metadata Yes Yes No
|author-link= support Yes Yes No
|displayauthors= support Yes Yes No
Author error checking No Yes No
Compact No Yes Yes
@J. Johnson: Please consider using the |vauthors= parameter (see table to the right) that is supported by CS1 style citation templates and easily parses Vancouver style comma delimited "mushedtogetherauthors". Why insist on an "absurdnumberofsuperfluousauthorparameters"? (see rationale) Boghog (talk) 20:05, 16 September 2015 (UTC)
Boghog: |vauthors= is specific for Vancouver style, which I – and most other editors outside of the medical topics – do not use. I would also dispute your implicit claim that vauthors= provides "clean author metadata", or that last/first does not allow "author error checking". I also deem the rationale for vauthors to be incorrect. However, all that is off-topic for this discussion. My point is that tackling any kind of "authors" problem is sufficiently challenging that it ought to be handled separately from other kinds of cleanup. ~ J. Johnson (JJ) (talk)
@J. Johnson: I would dispute your implicit claim that |vauthors= provides clean author metadata – that is not an implicit claim, that is an explicit claim (see explanation) that really does work. Please test |vauthors= with a metadata harvester like Zotero. Author first and last names are cleanly parsed and passed on to external citation manager applications. I would dispute .. last/first does not allow "author error checking" There is no standardization of the rendered output of the |firstn= parameter whereas |vauthors= insists on a standard format (first initial + an optional middle initial and in no cases, periods). I agree that a separate bot that respects WP:CITEVAR should handle author parameter cleanup. Boghog (talk) 21:21, 16 September 2015 (UTC)
Again, this is not the place to discuss the relative merits of vauthors=, etc. But I am pleased we agree on having a separate bot for handling authors. ~ J. Johnson (JJ) (talk) 21:31, 16 September 2015 (UTC)

Flagging as {{notabug}}, since it seems to be resolved now and is no longer doing this. AManWithNoPlan (talk) 15:42, 9 August 2016 (UTC)

Where the heck is the current source code?

The main page lists two repositories, and a google search finds others from other peoples for unknown reasons (We can call those suppositories instead of repositories). Both repositories seem to have been updated in the last year. AManWithNoPlan (talk) 18:23, 12 August 2016 (UTC)

The source code is at https://github.com/ms609/citation-bot. Kaldari (talk) 00:31, 13 August 2016 (UTC)

{{notabug}}

Bot should add more than four editors and add displayeditors=29 if there are exactly 4 editors

Status
new bug / feature request (two related features in one request)
Reported by
Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
Type of bug
Improvement
What happens
Bot limits editors to four first names and four last names.
What should happen
Bot should retrieve all editors and add "displayeditors=29" parameter if there are exactly four editors.
Replication instructions
Run the citation expander on a citation that has four editors listed but more than four editors in the original work.
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer
Remove four-editor limit from bot code and add "displayeditors=29" to citations with exactly four authors.


The bot should add "displayeditors=29" if there are exactly four editors to avoid the Lua error described for exactly 9 authors above. – Jonesey95 (talk) 23:49, 21 September 2013 (UTC) Is this still a bug? AManWithNoPlan (talk) 20:43, 6 August 2016 (UTC)

It is fixed, but in DOItools.php need to after this line:
     "editor4", "editor4-author", "editor4-first", "editor4-link",

add these lines

     "editor5", "editor5-author", "editor5-first", "editor5-link",
     "editor6", "editor6-author", "editor6-first", "editor6-link",
     "editor7", "editor7-author", "editor7-first", "editor7-link",
     "editor8", "editor8-author", "editor8-first", "editor8-link",
     "editor9", "editor9-author", "editor9-first", "editor9-link",
     "editor10", "editor10-author", "editor10-first", "editor10-link",
     "editor11", "editor11-author", "editor11-first", "editor11-link",
     "editor12", "editor12-author", "editor12-first", "editor12-link",
     "editor13", "editor13-author", "editor13-first", "editor13-link",
     "editor14", "editor14-author", "editor14-first", "editor14-link",

and so on AManWithNoPlan (talk) 03:54, 7 August 2016 (UTC)

{{resolved}} It is fixed for long time AManWithNoPlan (talk) 15:36, 12 October 2016 (UTC)


unnecessary addition of |DUPLICATE_page= parameter

Status
new bug
Reported by
Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
Type of bug
Inconvenience: Humans must clean up after the bot
What happens
|DUPLICATE_page= causes Module:Citation/CS1 to display a redundant error message
What should happen
when a template has both |page= and |pages=, the bot should do nothing
Relevant diffs/links
Dinophysis norvegica
We can't proceed until
Agreement on the best solution


Extended content

Without |DUPLICATE_page= parameter:

Dahl, E; Naustvoll, LJ; Gustad, E (14 Aug 2012). "Monitoring of Dinophysis species and diarrhetic shellfish toxins in Flødevigen Bay, Norway: inter-annual variability over a 25-year time-series". Food Additives & Contaminants: Part A. 29 (10): 1605. doi:10.1080/19440049.2012.714908. PMID 22891979. {{cite journal}}: More than one of |pages= and |page= specified (help)

with |DUPLICATE_page= parameter:

Dahl, E; Naustvoll, LJ; Gustad, E (14 Aug 2012). "Monitoring of Dinophysis species and diarrhetic shellfish toxins in Flødevigen Bay, Norway: inter-annual variability over a 25-year time-series". Food Additives & Contaminants: Part A. 29 (10): 1605. doi:10.1080/19440049.2012.714908. PMID 22891979. {{cite journal}}: More than one of |pages= and |page= specified (help); Unknown parameter |DUPLICATE_page= ignored (help)

—Trappist the monk (talk) 12:11, 26 June 2016 (UTC)

Has the older messaging been turned off? A citation with a duplicate date that I corrected here doesn't display the old message that duplicate dates are present in a particular citation, in a version previous to the bot correction. However, the first, misused, date value is still overwritten by the second. The new marking of duplicate page/date/etc. with a specious parameter is actually more helpful, as it points out the specific citation involved (try searching a long article, without the requisite knowledge of regexes, to find such duplication otherwise). Dhtwiki (talk) 04:16, 28 June 2016 (UTC)
If you go to that older version and click edit and then click Show preview, you will get the more-than-one-value-for-the-"date"-parameter error message. Is that what you mean by the 'old message'?
MediaWiki gives only one of any parameter to any template so the templates themselves cannot know that there is a duplicate parameter. In the particular case of the cs1|2 templates, they can detect the use of aliases (|page= is more-or-less an alias of |pages=) because the parameter names are different. The bot is unnecessarily duplicating the cs1|2 redundant parameter detection and messaging when it adds the |DUPLICATE_page= parameter.
Because duplicate parameters (where the parameter names are the same) are caught by MediaWiki and categorized in Category:Pages using duplicate arguments in template calls, it seems to me that the bot should stop adding the |DUPLICATE_<whatever>= parameter to cs1|2 templates.
—Trappist the monk (talk) 10:55, 28 June 2016 (UTC)
Yes, it must be the message in Preview mode, and I missed it with the edit in question. However, that is a too-general warning. I recently painstakingly looked through one lengthy article with a similar, Preview-mode warning for a duplicate date in cite-news, without finding the duplication. The presence of a bot that makes changes to a particular citation, although error-generating, seems to me to be an improvement. Dhtwiki (talk) 11:45, 28 June 2016 (UTC)
I agree that error messages at the top of the preview are only vaguely helpful. There are tools listed at Category:Pages using duplicate arguments in template calls that might be helpful the next time you are confronted with that kind of error. But duplicate parameters of the same name are not the issue of this bug report. Having the bot flag parameters with identical names can be beneficial; not so beneficial when the templates are already flagging aliases.
—Trappist the monk (talk) 13:32, 28 June 2016 (UTC)
A quick look at the bot's contribs shows it wreaking havoc with "unknown parameter" errors. I have no idea what it's trying to accomplish, but I'm fairly certain this is not an improvement. ―Mandruss  04:42, 30 June 2016 (UTC)
I reverted one case and the bot returned 4 minutes later and did it again. Bot needs to be shut down now please. ―Mandruss  04:50, 30 June 2016 (UTC)
The bot is simply turning an invisible error into a visible error, moving the article from one error category, where the error is difficult to find and fix, to another, where the error is easier to find and fix. This is not "wreaking havoc". Also, I suspect that this "bot" is being activated by a human editor, not running automatically on its own, but I don't know if there is a way to know this. – Jonesey95 (talk) 05:00, 30 June 2016 (UTC)
Yeah, I was premature. I now understand the rationale, I retract my statements, and I have fixed the duplicate in that one case. But this discussion should have occurred before implementation, not after, as there is more than one way to skin a cat. I'm not at all convinced that a more-visible "unknown parameter" error is better than a less-visible, but at least descriptive, "duplicate parameter" error. I'm not aware of any other cases where errors are flagged by introducing invalid parameters into template transclusions. This is not the only option available to us. ―Mandruss  05:08, 30 June 2016 (UTC)
Citation bot has had this feature for many years, but this |DUPLICATE_foo= stuff was only output when someone ran Citation bot manually on an article. That's why I think that someone is manually running the bot, essentially as a script, against the duplicate parameter category, which has about 5,000 articles left in it (down from well over 100,000 when the category was created). As I said, I don't know if there is a way of verifying that or knowing who is doing it.
In any event, I am finding the "unsupported parameter" versions of the citations much easier to fix than the articles in the "duplicate parameter" category, because you can do a find for "dupl" in the article and jump immediately to the problem citation. This one was cute: reference 27 had an ISBN in a duplicated title parameter for at least six years, displaying the ISBN instead of the title. Citation bot helped us find and fix the problem. – Jonesey95 (talk) 05:33, 30 June 2016 (UTC)
I don't dispute that this is an adequate solution, maybe the best one, for an editor with 7 years and 70K edits, who understands the technical background (you). Maybe even so for one with 3 years and 25K edits who understands the technical background (me). Not necessarily for the other 80%+ of the editing population. For example, this bot could follow the example of Cyberbot II and post an explanation on the article's talk page, leaving the citation alone. The explanation could list both duplicate parameter values, making the cite almost as easy to locate with browser Find.
As currently written, the bot has to arbitrarily choose one of the values to invalidate, and it may be the wrong one. It was the wrong one in the case I corrected: the other |title= value was not a title but an editorial description of the page. So the bot's bold edit will be an improvement for about 50% of cases. In the other half, the bot will eliminate an error and replace it with incorrect information. Human attention is needed in each case, and this doesn't get it except for the relatively rare cases where an editor knows what this odd use of "unknown parameter" means, knows how to fix it, and cares to take the time to fix it. The talk page is more visible than little errors in the References section, and it provides space for clear explanation. ―Mandruss  06:03, 30 June 2016 (UTC)
Re "arbitrarily...": The bot always tags the first duplicated parameter, which is the one that is not displayed in the citation. That preserves the rendered citation while adding the error message. There is no way that the bot could choose the "right" parameter to mark as a duplicate. – Jonesey95 (talk) 14:27, 30 June 2016 (UTC)
I disagree that this is necessarily a problem. These duplicate parameters can be hard to find in long articles, and the bot's conversion of one of the parameters to "DUPLICATE_parameter" makes the error stand out and helps editors who would otherwise not notice the errors find them and fix them. If there are any developers reading this page, I would like them to work on other bug fixes and feature requests before tackling this one. – Jonesey95 (talk) 14:43, 28 June 2016 (UTC)

{{notabug}} AManWithNoPlan (talk) 15:31, 12 October 2016 (UTC)

Citation_bot puts '# # # comment placeholder # # #'

Status
new bug
Reported by
Wikid77 (talk) 22:20, 2 August 2016 (UTC)
Type of bug
Inconvenience
What happens
Citation_bot inserts text "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters).
Relevant diffs/links
Diff from 07:17, 10 July 2016, in page "State Shinto": https://en.wikipedia.org/w/index.php?title=State_Shinto&diff=729148482&oldid=726231213
Replication instructions
(unsure)
We can't proceed until
Agreement on the best solution


This Citation_bot is duplicating the parameter "journal=" in a {cite journal} which contains comment-code "<!--xxx-->" as inserting text, "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters). This bug had been reported 6 months prior (botching the same page), on 5 February 2016, see: dif5594. -Wikid77 (talk) 22:20, revised 22:36, 2 August 2016 (UTC)

strange. This bug has come back. AManWithNoPlan (talk) 03:58, 3 August 2016 (UTC)
Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
it is clearly a related bug. Obviously there is some code that is trying to avoid the comment bug by encoding comments like this and then fails to de-encode them. AManWithNoPlan (talk) 15:10, 3 August 2016 (UTC)

This is all coming from this code AManWithNoPlan (talk) 19:56, 7 August 2016 (UTC):

class Comment extends Item {
  const placeholder_text = '# # # Citation bot : comment placeholder %s # # #';
  const regexp = '~<!--.*-->~us';
  const treat_identical_separately = FALSE;
  
  public function parse_text($text) {
    $this->rawtext = $text;
  }
  
  public function parsed_text() {
    return $this->rawtext;
  }
}

Note that the CASE of the above text does not match the bot bug. The code that fails is in objects.php AManWithNoPlan (talk) 20:13, 7 August 2016 (UTC):

  protected function replace_object ($objects) {
    $i = count($objects);
    if ($objects) foreach (array_reverse($objects) as $obj) 
      $this->text = str_replace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
  }

Note that the replace is CASE SENSITIVE. What about those situations, like in this bug where stuff was changed by Title Case or what not. Then this fails. The solution is:

  protected function replace_object ($objects) {
    $i = count($objects);
    if ($objects) foreach (array_reverse($objects) as $obj) 
      $this->text = str_ireplace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
  }

Also should in public function write() in objects.php to add after this code:

      if ($my_page->lastrevid != $this->lastrevid) {
        echo "\n ! Possible edit conflict detected. Aborting.";
        return FALSE;
      }

add this code

      if ( stripos($this->text,"Citation bot : comment placeholder") != false )  {
        echo "\n ! Comment placeholder left escaped. Aborting.";
        return FALSE;
      }

This will make sure that we never have the bug again. Of course, the bot will fail to work on such pages, so the real solution is to make sure that every escaping is un-escaped. AManWithNoPlan (talk) 03:45, 7 August 2016 (UTC)

{{Resolved}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)

Google dates are not in a standard format

Status
Feature Request
Reported by
Keith D (talk) 00:14, 19 August 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
Added invalidly formatted date to cite
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Umrah&diff=735160730&oldid=733744115
We can't proceed until
Agreement on the best solution
Requested action from maintainer
Change to not add date entries that trigger an error condition. It should use an en-dash and not dashes to join dates parts such as 2015–16. But in this case it should translate to November 2009 as non consecutive years.


Google has date as "2009.11". The bot changes dots to dashes, which is an improvement over what google gives it. This is vaguely a minor version of the google books data is rubbish bug. AManWithNoPlan (talk) 03:14, 19 August 2016 (UTC)

{{resolved}} It seems to do the right thing now. AManWithNoPlan (talk) 15:33, 12 October 2016 (UTC)

Stop Citation_bot adding DUPLICATE_xxx

This bot should be STOPPED until it can be fixed, as it still adds unneeded "DUPLICATE_title" (etc.) even though there is the "Category:Pages using duplicate arguments in template calls" (in cites, infoboxes), and still treats lone parameters as if duplicate when cite contains an HTML comment "<!-- -->" with no duplicate keywords. Meanwhile, the flooding of cite categories hides other pages with real overlooked cite errors, such as vandalism to cite parameters, tracked in category:

  • "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)

Because of the flooding of that unsupported-parameter category by Citation_bot, it took 5 days to fix a vandalized cite page (among 120 listed), which could encourage vandals to hack more pages which can remain botched for 5 days. A flooded category often can prolong errors for months/years in semi-major pages (re: "The Band Perry" listed down under "T"), because cite errors are mainly fixed by wp:wikignomes clearing all pages from a cite-error category, where typical editors almost never fix 90% of red-error cite problems. Stop Citation_bot. -Wikid77 (talk) 14:58, 30 September 2016 (UTC)

The fixed code is on github (I know, since I wrote the fix). Some one with power needs to upload it to wikipedia. AManWithNoPlan (talk) 15:05, 30 September 2016 (UTC)
Wikid77, there is no "flooding." 100 articles is not flooding the "unsupported parameter" category, which currently contains just 18 pages. I fix articles in that category most days, as do other editors.
The articles with duplicate parameters have errors. They are just being moved from one error category (that is widely ignored, and to which your vandalism example applies much more accurately) to another error category that is regularly cleaned out, and flagging the errors in red makes them easier to find and fix by searching quickly for the string "duplicate". Also, there are only about 1,800 articles left in the duplicate parameter category, so even if they were all moved into the unsupported parameter category, it wouldn't take that long to fix them. – Jonesey95 (talk) 15:47, 30 September 2016 (UTC)

Thanks, Jonesey95, for helping to fix those hundreds of pages in the unsupported category. Because it then contained only 19 pages, I was able to fix the numerous recent hack edits to popular U.S. TV star "Estelle Getty" within 3 hours, after User:Citation_bot had recently linked over 250 pages into that category:

"Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)

For many editors, fixing those hundreds of pages for parameters "DUPLICATE_xxx" is very tedious because the linked url+titles or dates or publisher must be verified by downloading source pages or PDF documents or googling printed books and scanning for title/date markings to ensure the duplicate is not the original, or in some cases both dates or titles must be fixed, unlike a simple parameter spelling error, such as "tittle=" as "title=" or "frist2=" as "first2=" etc. Hence, the generated cite errors for DUPLICATE_xx are often much harder to fix (and users have complained), plus Citation_bot leaves other duplicate parameters in the same pages and does not solve all the duplication problems, just obscures the unsupported-parameters category by 6x as many pages with complex errors often 10-times harder to fix, as effectively flooding the category by a 60x-heavier workload (when fixed properly). Meanwhile, after fixing several hundred duplicate parameters, I have found almost no vandalism (or other parameter errors) in pages with duplicates, but 1-in-10 misspelled, unsupported parameters seem to be caused by severe hack edits affecting other sections of a page. The largest amount of hacked cites are in unsupported parameters, not in duplicate parameters often caused by a 2nd date in ISO format, a 2nd (sub)title, an alternate URL, a 2nd publisher agency, or a nearby valid author/date also called "title". Citation_bot is obscuring simple fixes by escalating complex duplication issues into the wrong, smaller category. -Wikid77 (talk) 07:25, 4 October 2016 (UTC)

Your rationale makes no sense. The error exists regardless; either we can surface it easily for editors, or not. I agree with Jonesey in this regard. --Izno (talk) 11:10, 4 October 2016 (UTC)
you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
@AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
I agree that it is a great feature, but sometimes it adds duplicate when there is not one because of comments AManWithNoPlan (talk) 14:10, 4 October 2016 (UTC)

{{notabug}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)

Bot should add more than four editors and add displayeditors=29 if there are exactly 4 editors

Status
new bug / feature request (two related features in one request)
Reported by
Jonesey95 (talk) 23:49, 21 September 2013 (UTC)
Type of bug
Improvement
What happens
Bot limits editors to four first names and four last names.
What should happen
Bot should retrieve all editors and add "displayeditors=29" parameter if there are exactly four editors.
Replication instructions
Run the citation expander on a citation that has four editors listed but more than four editors in the original work.
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer
Remove four-editor limit from bot code and add "displayeditors=29" to citations with exactly four authors.


The bot should add "displayeditors=29" if there are exactly four editors to avoid the Lua error described for exactly 9 authors above. – Jonesey95 (talk) 23:49, 21 September 2013 (UTC) Is this still a bug? AManWithNoPlan (talk) 20:43, 6 August 2016 (UTC)

It is fixed, but in DOItools.php need to after this line:
     "editor4", "editor4-author", "editor4-first", "editor4-link",

add these lines

     "editor5", "editor5-author", "editor5-first", "editor5-link",
     "editor6", "editor6-author", "editor6-first", "editor6-link",
     "editor7", "editor7-author", "editor7-first", "editor7-link",
     "editor8", "editor8-author", "editor8-first", "editor8-link",
     "editor9", "editor9-author", "editor9-first", "editor9-link",
     "editor10", "editor10-author", "editor10-first", "editor10-link",
     "editor11", "editor11-author", "editor11-first", "editor11-link",
     "editor12", "editor12-author", "editor12-first", "editor12-link",
     "editor13", "editor13-author", "editor13-first", "editor13-link",
     "editor14", "editor14-author", "editor14-first", "editor14-link",

and so on AManWithNoPlan (talk) 03:54, 7 August 2016 (UTC)

{{resolved}} It is fixed for long time AManWithNoPlan (talk) 15:36, 12 October 2016 (UTC)

unnecessary addition of |DUPLICATE_page= parameter

Status
new bug
Reported by
Trappist the monk (talk) 12:11, 26 June 2016 (UTC)
Type of bug
Inconvenience: Humans must clean up after the bot
What happens
|DUPLICATE_page= causes Module:Citation/CS1 to display a redundant error message
What should happen
when a template has both |page= and |pages=, the bot should do nothing
Relevant diffs/links
Dinophysis norvegica
We can't proceed until
Agreement on the best solution


Extended content

Without |DUPLICATE_page= parameter:

Dahl, E; Naustvoll, LJ; Gustad, E (14 Aug 2012). "Monitoring of Dinophysis species and diarrhetic shellfish toxins in Flødevigen Bay, Norway: inter-annual variability over a 25-year time-series". Food Additives & Contaminants: Part A. 29 (10): 1605. doi:10.1080/19440049.2012.714908. PMID 22891979. {{cite journal}}: More than one of |pages= and |page= specified (help)

with |DUPLICATE_page= parameter:

Dahl, E; Naustvoll, LJ; Gustad, E (14 Aug 2012). "Monitoring of Dinophysis species and diarrhetic shellfish toxins in Flødevigen Bay, Norway: inter-annual variability over a 25-year time-series". Food Additives & Contaminants: Part A. 29 (10): 1605. doi:10.1080/19440049.2012.714908. PMID 22891979. {{cite journal}}: More than one of |pages= and |page= specified (help); Unknown parameter |DUPLICATE_page= ignored (help)

—Trappist the monk (talk) 12:11, 26 June 2016 (UTC)

Has the older messaging been turned off? A citation with a duplicate date that I corrected here doesn't display the old message that duplicate dates are present in a particular citation, in a version previous to the bot correction. However, the first, misused, date value is still overwritten by the second. The new marking of duplicate page/date/etc. with a specious parameter is actually more helpful, as it points out the specific citation involved (try searching a long article, without the requisite knowledge of regexes, to find such duplication otherwise). Dhtwiki (talk) 04:16, 28 June 2016 (UTC)
If you go to that older version and click edit and then click Show preview, you will get the more-than-one-value-for-the-"date"-parameter error message. Is that what you mean by the 'old message'?
MediaWiki gives only one of any parameter to any template so the templates themselves cannot know that there is a duplicate parameter. In the particular case of the cs1|2 templates, they can detect the use of aliases (|page= is more-or-less an alias of |pages=) because the parameter names are different. The bot is unnecessarily duplicating the cs1|2 redundant parameter detection and messaging when it adds the |DUPLICATE_page= parameter.
Because duplicate parameters (where the parameter names are the same) are caught by MediaWiki and categorized in Category:Pages using duplicate arguments in template calls, it seems to me that the bot should stop adding the |DUPLICATE_<whatever>= parameter to cs1|2 templates.
—Trappist the monk (talk) 10:55, 28 June 2016 (UTC)
Yes, it must be the message in Preview mode, and I missed it with the edit in question. However, that is a too-general warning. I recently painstakingly looked through one lengthy article with a similar, Preview-mode warning for a duplicate date in cite-news, without finding the duplication. The presence of a bot that makes changes to a particular citation, although error-generating, seems to me to be an improvement. Dhtwiki (talk) 11:45, 28 June 2016 (UTC)
I agree that error messages at the top of the preview are only vaguely helpful. There are tools listed at Category:Pages using duplicate arguments in template calls that might be helpful the next time you are confronted with that kind of error. But duplicate parameters of the same name are not the issue of this bug report. Having the bot flag parameters with identical names can be beneficial; not so beneficial when the templates are already flagging aliases.
—Trappist the monk (talk) 13:32, 28 June 2016 (UTC)
A quick look at the bot's contribs shows it wreaking havoc with "unknown parameter" errors. I have no idea what it's trying to accomplish, but I'm fairly certain this is not an improvement. ―Mandruss  04:42, 30 June 2016 (UTC)
I reverted one case and the bot returned 4 minutes later and did it again. Bot needs to be shut down now please. ―Mandruss  04:50, 30 June 2016 (UTC)
The bot is simply turning an invisible error into a visible error, moving the article from one error category, where the error is difficult to find and fix, to another, where the error is easier to find and fix. This is not "wreaking havoc". Also, I suspect that this "bot" is being activated by a human editor, not running automatically on its own, but I don't know if there is a way to know this. – Jonesey95 (talk) 05:00, 30 June 2016 (UTC)
Yeah, I was premature. I now understand the rationale, I retract my statements, and I have fixed the duplicate in that one case. But this discussion should have occurred before implementation, not after, as there is more than one way to skin a cat. I'm not at all convinced that a more-visible "unknown parameter" error is better than a less-visible, but at least descriptive, "duplicate parameter" error. I'm not aware of any other cases where errors are flagged by introducing invalid parameters into template transclusions. This is not the only option available to us. ―Mandruss  05:08, 30 June 2016 (UTC)
Citation bot has had this feature for many years, but this |DUPLICATE_foo= stuff was only output when someone ran Citation bot manually on an article. That's why I think that someone is manually running the bot, essentially as a script, against the duplicate parameter category, which has about 5,000 articles left in it (down from well over 100,000 when the category was created). As I said, I don't know if there is a way of verifying that or knowing who is doing it.
In any event, I am finding the "unsupported parameter" versions of the citations much easier to fix than the articles in the "duplicate parameter" category, because you can do a find for "dupl" in the article and jump immediately to the problem citation. This one was cute: reference 27 had an ISBN in a duplicated title parameter for at least six years, displaying the ISBN instead of the title. Citation bot helped us find and fix the problem. – Jonesey95 (talk) 05:33, 30 June 2016 (UTC)
I don't dispute that this is an adequate solution, maybe the best one, for an editor with 7 years and 70K edits, who understands the technical background (you). Maybe even so for one with 3 years and 25K edits who understands the technical background (me). Not necessarily for the other 80%+ of the editing population. For example, this bot could follow the example of Cyberbot II and post an explanation on the article's talk page, leaving the citation alone. The explanation could list both duplicate parameter values, making the cite almost as easy to locate with browser Find.
As currently written, the bot has to arbitrarily choose one of the values to invalidate, and it may be the wrong one. It was the wrong one in the case I corrected: the other |title= value was not a title but an editorial description of the page. So the bot's bold edit will be an improvement for about 50% of cases. In the other half, the bot will eliminate an error and replace it with incorrect information. Human attention is needed in each case, and this doesn't get it except for the relatively rare cases where an editor knows what this odd use of "unknown parameter" means, knows how to fix it, and cares to take the time to fix it. The talk page is more visible than little errors in the References section, and it provides space for clear explanation. ―Mandruss  06:03, 30 June 2016 (UTC)
Re "arbitrarily...": The bot always tags the first duplicated parameter, which is the one that is not displayed in the citation. That preserves the rendered citation while adding the error message. There is no way that the bot could choose the "right" parameter to mark as a duplicate. – Jonesey95 (talk) 14:27, 30 June 2016 (UTC)
I disagree that this is necessarily a problem. These duplicate parameters can be hard to find in long articles, and the bot's conversion of one of the parameters to "DUPLICATE_parameter" makes the error stand out and helps editors who would otherwise not notice the errors find them and fix them. If there are any developers reading this page, I would like them to work on other bug fixes and feature requests before tackling this one. – Jonesey95 (talk) 14:43, 28 June 2016 (UTC)

{{notabug}} AManWithNoPlan (talk) 15:31, 12 October 2016 (UTC)

Citation_bot puts '# # # comment placeholder # # #'

Status
new bug
Reported by
Wikid77 (talk) 22:20, 2 August 2016 (UTC)
Type of bug
Inconvenience
What happens
Citation_bot inserts text "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters).
Relevant diffs/links
Diff from 07:17, 10 July 2016, in page "State Shinto": https://en.wikipedia.org/w/index.php?title=State_Shinto&diff=729148482&oldid=726231213
Replication instructions
(unsure)
We can't proceed until
Agreement on the best solution


This Citation_bot is duplicating the parameter "journal=" in a {cite journal} which contains comment-code "<!--xxx-->" as inserting text, "|# # # citation bot : comment placeholder 0 # # #journal =" (as text generated inside the {cite_journal} parameters). This bug had been reported 6 months prior (botching the same page), on 5 February 2016, see: dif5594. -Wikid77 (talk) 22:20, revised 22:36, 2 August 2016 (UTC)

strange. This bug has come back. AManWithNoPlan (talk) 03:58, 3 August 2016 (UTC)
Is this not a special case of #Comments cause trouble. --Izno (talk) 13:10, 3 August 2016 (UTC)
I think this is a different bug, previously reported, unresolved, and archived for some reason. – Jonesey95 (talk) 13:25, 3 August 2016 (UTC)
it is clearly a related bug. Obviously there is some code that is trying to avoid the comment bug by encoding comments like this and then fails to de-encode them. AManWithNoPlan (talk) 15:10, 3 August 2016 (UTC)

This is all coming from this code AManWithNoPlan (talk) 19:56, 7 August 2016 (UTC):

class Comment extends Item {
  const placeholder_text = '# # # Citation bot : comment placeholder %s # # #';
  const regexp = '~<!--.*-->~us';
  const treat_identical_separately = FALSE;
  
  public function parse_text($text) {
    $this->rawtext = $text;
  }
  
  public function parsed_text() {
    return $this->rawtext;
  }
}

Note that the CASE of the above text does not match the bot bug. The code that fails is in objects.php AManWithNoPlan (talk) 20:13, 7 August 2016 (UTC):

  protected function replace_object ($objects) {
    $i = count($objects);
    if ($objects) foreach (array_reverse($objects) as $obj) 
      $this->text = str_replace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
  }

Note that the replace is CASE SENSITIVE. What about those situations, like in this bug where stuff was changed by Title Case or what not. Then this fails. The solution is:

  protected function replace_object ($objects) {
    $i = count($objects);
    if ($objects) foreach (array_reverse($objects) as $obj) 
      $this->text = str_ireplace(sprintf($obj::placeholder_text, --$i), $obj->parsed_text(), $this->text);
  }

Also should in public function write() in objects.php to add after this code:

      if ($my_page->lastrevid != $this->lastrevid) {
        echo "\n ! Possible edit conflict detected. Aborting.";
        return FALSE;
      }

add this code

      if ( stripos($this->text,"Citation bot : comment placeholder") != false )  {
        echo "\n ! Comment placeholder left escaped. Aborting.";
        return FALSE;
      }

This will make sure that we never have the bug again. Of course, the bot will fail to work on such pages, so the real solution is to make sure that every escaping is un-escaped. AManWithNoPlan (talk) 03:45, 7 August 2016 (UTC)

{{Resolved}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)

Google dates are not in a standard format

Status
Feature Request
Reported by
Keith D (talk) 00:14, 19 August 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
Added invalidly formatted date to cite
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Umrah&diff=735160730&oldid=733744115
We can't proceed until
Agreement on the best solution
Requested action from maintainer
Change to not add date entries that trigger an error condition. It should use an en-dash and not dashes to join dates parts such as 2015–16. But in this case it should translate to November 2009 as non consecutive years.


Google has date as "2009.11". The bot changes dots to dashes, which is an improvement over what google gives it. This is vaguely a minor version of the google books data is rubbish bug. AManWithNoPlan (talk) 03:14, 19 August 2016 (UTC)

{{resolved}} It seems to do the right thing now. AManWithNoPlan (talk) 15:33, 12 October 2016 (UTC)

Stop Citation_bot adding DUPLICATE_xxx

This bot should be STOPPED until it can be fixed, as it still adds unneeded "DUPLICATE_title" (etc.) even though there is the "Category:Pages using duplicate arguments in template calls" (in cites, infoboxes), and still treats lone parameters as if duplicate when cite contains an HTML comment "<!-- -->" with no duplicate keywords. Meanwhile, the flooding of cite categories hides other pages with real overlooked cite errors, such as vandalism to cite parameters, tracked in category:

  • "Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)

Because of the flooding of that unsupported-parameter category by Citation_bot, it took 5 days to fix a vandalized cite page (among 120 listed), which could encourage vandals to hack more pages which can remain botched for 5 days. A flooded category often can prolong errors for months/years in semi-major pages (re: "The Band Perry" listed down under "T"), because cite errors are mainly fixed by wp:wikignomes clearing all pages from a cite-error category, where typical editors almost never fix 90% of red-error cite problems. Stop Citation_bot. -Wikid77 (talk) 14:58, 30 September 2016 (UTC)

The fixed code is on github (I know, since I wrote the fix). Some one with power needs to upload it to wikipedia. AManWithNoPlan (talk) 15:05, 30 September 2016 (UTC)
Wikid77, there is no "flooding." 100 articles is not flooding the "unsupported parameter" category, which currently contains just 18 pages. I fix articles in that category most days, as do other editors.
The articles with duplicate parameters have errors. They are just being moved from one error category (that is widely ignored, and to which your vandalism example applies much more accurately) to another error category that is regularly cleaned out, and flagging the errors in red makes them easier to find and fix by searching quickly for the string "duplicate". Also, there are only about 1,800 articles left in the duplicate parameter category, so even if they were all moved into the unsupported parameter category, it wouldn't take that long to fix them. – Jonesey95 (talk) 15:47, 30 September 2016 (UTC)

Thanks, Jonesey95, for helping to fix those hundreds of pages in the unsupported category. Because it then contained only 19 pages, I was able to fix the numerous recent hack edits to popular U.S. TV star "Estelle Getty" within 3 hours, after User:Citation_bot had recently linked over 250 pages into that category:

"Category:Pages with citations using unsupported parameters" (where "DUPLICATE_xx" appear)

For many editors, fixing those hundreds of pages for parameters "DUPLICATE_xxx" is very tedious because the linked url+titles or dates or publisher must be verified by downloading source pages or PDF documents or googling printed books and scanning for title/date markings to ensure the duplicate is not the original, or in some cases both dates or titles must be fixed, unlike a simple parameter spelling error, such as "tittle=" as "title=" or "frist2=" as "first2=" etc. Hence, the generated cite errors for DUPLICATE_xx are often much harder to fix (and users have complained), plus Citation_bot leaves other duplicate parameters in the same pages and does not solve all the duplication problems, just obscures the unsupported-parameters category by 6x as many pages with complex errors often 10-times harder to fix, as effectively flooding the category by a 60x-heavier workload (when fixed properly). Meanwhile, after fixing several hundred duplicate parameters, I have found almost no vandalism (or other parameter errors) in pages with duplicates, but 1-in-10 misspelled, unsupported parameters seem to be caused by severe hack edits affecting other sections of a page. The largest amount of hacked cites are in unsupported parameters, not in duplicate parameters often caused by a 2nd date in ISO format, a 2nd (sub)title, an alternate URL, a 2nd publisher agency, or a nearby valid author/date also called "title". Citation_bot is obscuring simple fixes by escalating complex duplication issues into the wrong, smaller category. -Wikid77 (talk) 07:25, 4 October 2016 (UTC)

Your rationale makes no sense. The error exists regardless; either we can surface it easily for editors, or not. I agree with Jonesey in this regard. --Izno (talk) 11:10, 4 October 2016 (UTC)
you are partially incorrect. Until the new source code on github is loaded to Wikipedia till servers the bot will continue to wrongly add DUPLICATE. Those are the real problem. AManWithNoPlan (talk) 13:36, 4 October 2016 (UTC)
@AManWithNoPlan: Jonesey and I are saying it's a feature and not a bug. --Izno (talk) 13:42, 4 October 2016 (UTC)
I agree that it is a great feature, but sometimes it adds duplicate when there is not one because of comments AManWithNoPlan (talk) 14:10, 4 October 2016 (UTC)

{{notabug}} AManWithNoPlan (talk) 14:35, 12 October 2016 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=User_talk:Citation_bot/Archive_4&oldid=1093898055"