Benutzer Diskussion:Stefan Kühn/Check Wikipedia/Archiv/2009/Juli

Letzter Kommentar: vor 15 Jahren von JoRobot in Abschnitt Error #69

Exclusion list

Hello,

I think an exclusion list (page not scanned by your script) per error will help us. Some examples (discussed on French wiki)

  • for error 37 when we don't want to add DEFAULTSORT on sinogrammes, kanji, ...
  • for error 30 when some files should not have a description (especially image links on some models named {{Infobox ...}}) ...
  • for links to other namespaces (on discussion page, we said that sometimes a change is needed, sometimes not)


And also for false positives fr:Travail des enfants, fr:Oiseau (squelette)

We could add these pages (after verifying them) on an exclusion list per error (like Projet:Correction syntaxique/Exclusion list/30 for error 30, ...), and your script should ignore pages listed in exclusion list. What do you think about that ?

Thanks. Al1 16:46, 1. Jul. 2009 (CEST)

Hello Al1, we had in DE also a discussion about a whitelist for every error. Now you want in FR also a whitelist. I had the idea for WikiProject Check Wikipedia/Whitelist#36. In the next months I will try to make this possible. At the moment I am very busy (privat and at work). Please write the exlusions in this time in the description (like in EN). -- sk 22:36, 1. Jul. 2009 (CEST)

Error 083 - possible bug (WP in Italian)

Hello, this short message to inform you that the Check Wikipedia script run on Wikipedia in Italian flagged for an error 083 (Headlines start with three "=" and later with level two) on the article "it:Episodi di Pocket Monsters Diamond & Pearl" where actually the first headline starts with two "=" but it is inserted within a "noinclude" pair like that: <noinclude>== Title ==</noinclude>. The article was derived by "stripping" part of the contents from a very long original one and the "noinclude" is useful for a correct handling of nested articles. IMHO, if possible and if this does not conflict with other processing of the script, the "noinclude" should be ignored so that the script detects the logically correct sequence of headline levels. Thank you very much and keep up with this precious job. -- L736E 18:46, 8. Jul. 2009 (CEST)

Hello L736E, I think I need this detection of "noinclude" for other thinks. But also I think this headline inside a noinclude is a bug in the article. At the moment I have no idea, how to fix this. -- sk 21:21, 13. Jul. 2009 (CEST)

Error #033

Hello Stefan. As far as I can see here, there is no wiki syntax routine to replace underlined text (<u>) so what is the use of this error? --Superyetkin 18:08, 13. Jul. 2009 (CEST)

See here. The underline is a tag which will not supported in the future of html. If you really need this in a article than it should stand in span. This is XHTML-conform. -- sk 21:37, 13. Jul. 2009 (CEST)
I'm sorry to interrupt. But that is bullshit. You have no right to force people to use span-tags in stead of u-tags. Mediawiki should simply keep supporting u-tags, end of story. Follow the kiss principle. -- chemiewikibm cwbm 22:31, 13. Jul. 2009 (CEST)
You can disable this check on your wiki if you don't want to use it. Just set the priority in the _xx part of the translation text. --Vina 08:51, 14. Jul. 2009 (CEST)
Thanks for the clarification, Stefan. I would really appreciate it if you answered my other query about the error #003 above. Cheers! --Superyetkin 23:08, 13. Jul. 2009 (CEST)

Error #040 in Japanese Wikipedia

Hello, Stefan!

In jawiki, there are a lot of HTML font tag, but script reports no font tag. This problem was reported until 2009-01-30 version. but it occured next 2009-02-11 version. Best regards! --Mymelo 03:30, 11. Jul. 2009 (CEST)

Thanks for this info. Also in other languages there are no errors. I will check this in the script. Maybe I an fix this. -- sk 21:35, 13. Jul. 2009 (CEST)
Many thanks for your comment. --Mymelo 13:40, 18. Jul. 2009 (CEST)
 Ok, I have change the script a little bit. No I search not only for "<font>". It will also searched for "<font ...". Maybe this help. We will see this tomorrow.

Many suggestions

Hello Mr. Kühn,

While editing on the French Wikipedia, I found many possible errors. I list an example of each.

  • HTML entity &#x2200; (∀) should be translated into Unicode character \u2200.
  • HTML entity &#2200; (࢘) should be translated into Unicode character \u0898.
  • Many HTML entities, like &Eacute (É), should be converted to Unicode. However, &nbsp; must be excluded, since it has legitimate use. A list of such entities is given by the Web Design Group. I have the full list in a JavaScript file, I can send it to you (I use it within my Firefox extension, Weekedit). They are listed on the EN.WP : en:List of XML and HTML character entity references.
  • If the title of the article is PSoC, the sort key should be {{DEFAULTSORT:Psoc}}.
  • A category sort key with a diacritic, like [[Catégorie:Acteur français|Depardieu, Gérard]], is bad.
  • The wikilink [[fractale|fractales]] should be shorter : [[fractale]]s.

Regards,

Cantons-de-l'Est 03:19, 17. Jul. 2009 (CEST)

Very interesting ideas. I will try to insert this in my script. -- sk 22:01, 20. Jul. 2009 (CEST)
A comment: I don't think the last one is a good idea, because it will increase the number of false positives of a spell-checker. --129.215.104.155 12:36, 21. Jul. 2009 (CEST)

Error #003 in Japanese Wikipedia

Hello, Stefan.

In jawiki, there are error reports on error #003, but 2 article is fixed by reference tamplate.

ja:国際水泳連盟 has template {{脚注リスト}}, that is new redirect for {{Reflist}} template. Please set your script for Japanese localise. But I am not find out ja:八王子市's problem. Best Regards. --Mymelo 13:11, 18. Jul. 2009 (CEST)

I will insert this at the next weekend. -- sk 22:02, 20. Jul. 2009 (CEST)
I will also check ja:八王子市 at the weekend. At the moment I don't find a problem.-- sk 22:05, 20. Jul. 2009 (CEST)
Stefan. Could you be so kind to do the same (update your script) for Turkish wiki as well? Actually, I had mentioned this before (see my above posts) but you do not seem to have recognized them at all. Thanks for your help. --Superyetkin 22:48, 20. Jul. 2009 (CEST)
There is only one script. It work in all languages. -- sk 22:12, 3. Aug. 2009 (CEST)

non detected templates for # 34

Hello, your script should detect using of {{#ifexist|a|b}} templates like here JAn Dudík 08:50, 23. Jul. 2009 (CEST)

 Ok, I insert this in the script. -- sk 22:08, 3. Aug. 2009 (CEST)

Special characters in interwiki

Can you detect special characters in interwiki like after this edit? JAn Dudík 11:04, 23. Jul. 2009 (CEST)

Good idea. I write this at my to-do-list. -- sk 08:59, 4. Aug. 2009 (CEST)

More flexible error list?

Dear Stefan!

At this moment one of your programs gets file http://toolserver.org/~sk/checkwiki/huwiki/huwiki_translation.txt and puts content of fields error_XXX_head_script into the final error list. If no error fount field content is copied unchanged otherwise it is surrounded with [[...]] markers.

I would add additional info this table column but the above mechanism does not allow it. However I'd have a suggestion. If you find a template called chkwiki here you should not add square brackets but overwrite the first parameter with error count. I mean something similar:

error_XXX_head_script={{chkwiki|count=N|errno=XXX|msg0=cell_text_A|msg1=cell_text_B}}

Error count should be put in place N. At this point we could write arbitrary templates that changes the displayed text according to error count. Sky is the limit. :-)
However if translated text does not begin with {{chkwiki|count=N|... your program would apply the current algorithm. This way compatibility is preserved with current style translations.

What is your opinion? -- Bitman 08:01, 29. Jul. 2009 (CEST)

The problem is that every language need this template. Every change must be change in all language. At the moment we have more then 30 languages. But in the future it will be more then 30. I think this is not practicable. -- sk 09:57, 4. Aug. 2009 (CEST)

Uhmm... I don't understand what you mean. Could you show an example? AFAIK the solution I suggested is totally independent on number of checked wikis and languages. It is not necessary to write such a template in every wiki. If somebody needs it he uses it. Other wiki maintainers do not care with it. They get the current internationalised error list. -- Bitman 18:45, 4. Aug. 2009 (CEST)

Of course national templates are created and maintained by local people. You have nothing to do with them. --Bitman 18:55, 4. Aug. 2009 (CEST)
Ok, I understand. You mean I should update the script so that every language can use an own template inside the translation. If I understand you right then is the problem the [[...]]. But I don't understand what do you want with this template? I use the "error_XXX_head_script" only as headline and inside the statistic table. Please describe me better the "Sky" :-) -- sk 21:50, 4. Aug. 2009 (CEST)
Yes, the problem is that there is no way to adapt to [[...]] placed (or not) by your program. A localized template however would apply (or not) square brackets where necessary depending on error count meanwhile other elements of the table cell remain fixed.
Actually I want to add a warning icon   to items that are bot correctable so human editors would not waste their time by editing these trivial errors.
Another advantage: now our editors remove manually the wikilink leaving the plain text in table cell after fixing every errors of a certain kind. It is faster and easier a bit to change template parameter count from 11 to 0.
The sky? Version 2.0 of the template may also insert a smiley   into the cell of solved errors. --Bitman 16:25, 7. Aug. 2009 (CEST)

White space detection

Hello, can you insert new error - articles with long text with whitespace at teh begining of line. Whitespace canbe used

for scripts or something like,

but I think this scripts might be shorter than e.g. 80 characters. JAn Dudík 08:46, 23. Jul. 2009 (CEST)

I have also this idea, but I have no good algorithmen to detect this. There are too many problems at the moment. For example source or templates. -- sk 08:57, 4. Aug. 2009 (CEST)
pywikipediabot uses serveral exceptions for text inside various tags, maybe you can give a look at it. --Nemo bis 01:37, 12. Aug. 2009 (CEST)

Error #082 on Swedish Wikipedia

Links starting with "S:", like in [[S:t Lukasstiftelsen]] is not a link to any other wikimedia-project from the swedish wikipedia, because there are many names starting with S:t in swedish. Best regards! -- Lavallen 21:51, 10. Jul. 2009 (CEST)

Ohh, very interesting. I think this is the short link to Wikisource. What did you use as shortlink to Wikisource in Swedish Wikipedia? -- sk 21:31, 13. Jul. 2009 (CEST)
The Swedish Wikipedia use "src" as shortlink to Wikisource. Elfsborgarn 13:33, 16. Jul. 2009 (CEST)
I see you deactivated this in svwiki. It would be many work to include this in the script. -- sk 21:10, 18. Aug. 2009 (CEST)

False positive #81

Dear Stefan! Article hu:Stadler FLIRT contains some extreme large references with embedded tables. Tables are different in ref#17-ref#19 but your script reports them identical. -- Bitman 193.6.17.154 18:58, 16. Jul. 2009 (CEST)

I delete the table for my script and so the references are identical. I never see an reference like this. Why do you need this? I think a reference should only get a link to a source. -- sk 20:21, 16. Jul. 2009 (CEST)

I can't answer, I'm not editor of the article. I write a modular bot to repair errors discovered by you. Repairing #81 is quite complicated but not impossible: hu:User:GumiBot/code81. I think exact detection of identical refs may be less hard. :-) -- Bitman 193.6.17.197 07:08, 17. Jul. 2009 (CEST)

Bug from Danish Wikipedia

After editing some ref's from #81 in a couple of articles, new ref-bugs from the same articles (da:Jehovas Vidner and da:Dansk køkken)appeard - but they weren't added to the article after I edited them. So the conclussion must be that #81 doesn't catch more than one bug from a articel at a time :) --Anigif 23:18, 12. Aug. 2009 (CEST)

Yes this is right. I give only the first double ref. Because sometime many of them in on article. -- sk 21:29, 18. Aug. 2009 (CEST)

Error #69

Hi, there may be a false positive in it:Codice ISBN as one image name contains "ISBN-13". That's my guess, could you check it as well? Marcol-it

I can insert this article as exclude article. I do this with article ISBN in de. -- sk 22:06, 20. Jul. 2009 (CEST)
Thanks, that will be good! :) Marcol-it 18:08, 21. Jul. 2009 (CEST)
 Ok, I fix this. -- sk 22:11, 3. Aug. 2009 (CEST)

We get the same false positive in ca:Lector de codi de barres, and we will get it in ca:ISBN if it gets inspected. Can you please white-list them? --JoRobot 23:45, 1. Sep. 2009 (CEST)

Other namespaces

Hi Stefan. I think that the script only searches for errors on namespace 0 (principal). It may be nice that, on eswiki at least, it also find in namespace 104 (Anexo:), which is used for lists and can have the same errors to fix. Can your script scan namespaces 0 and 104 next times? Thanks in advance! Muro de Aguas 19:07, 9. Jul. 2009 (CEST)

Hello Muro de Aguas, thanks for this info. I never heard about this namespace 104. This is very interessting. I will try to include this in the next time. -- sk 21:24, 13. Jul. 2009 (CEST)
I write this at my To-do-list. -- sk 22:00, 3. Aug. 2009 (CEST)
 Ok, I have include this. -- sk 21:06, 18. Aug. 2009 (CEST)
Dieser Abschnitt kann archiviert werden. sk 21:00, 4. Okt. 2009 (CEST)

Suggests from France: Image without description

This detection returns a lot of false-positive errors. The problem have been suggested on french Project:CheckWiki discussion page and agreed for suggestion. When an image is used as a simple image, or in infobox (and "Template"), image description become as alternate despcrition. Alternate description problem is complicated and different from the need of description in a "thumb" image. "Image with really description needed" and "image with description not needed" are melted (I know alternate description is needed, but it's another problem). We'd like to modify the error 30, 2 ways have been suggested :

  1. Detect only description really needed : Thumb and gallery (gallery is allready detected) -> so only when "thumb" is added to the image. Simple image and image in infobox (and template) can be forgotten.
  2. Divide error in two pieces : same detection (only for thumb) and keep detection for simple image and image in infobox (and template).

The first one could be the best (at least for France) while at this moment the problem resolution of the alternate description of image is not going to happen soon. This fix could makes image problem easier, do you agree with this changes ?

Maybe ignore for this pictures with size smaller than 50 px? JAn Dudík 08:48, 23. Jul. 2009 (CEST)
The script is stupid. It only detect images in the text. Every image should have a description. Also the very small one. Yes this is much work. The only way is to divide this error. One with only thumbs and the rest. -- sk 09:09, 4. Aug. 2009 (CEST)
Splitting it in two errors is fine. This would fix the image really without description at least... "One with only thumbs and the rest", as you say... --Archimëa 17:07, 4. Aug. 2009 (CEST)

Suggests from France: Error 063

<sub><small>testo</small></sub> is detected, but <small><sub>testo</sub></small> isn't. Normal behaviour ? --Archimëa 16:31, 23. Jul. 2009 (CEST)

I'm wrong ? --Archimëa 17:07, 4. Aug. 2009 (CEST)

Suggests from France: Output limit

We'd like to increase the output limit of errors displayed by the script from 50 to 100. Indeed, the old version of the program returned 50 errors each day, while the new version only returns 50 every two days. As a consequence, the total number of errors rise up because errors are created faster than corrected. Could you please increase the maximum number to 100 in order to restore the previous rate/situation ?

--Archimëa 17:23, 21. Jul. 2009 (CEST)

At the moment the biggest list is in enwiki with 272KB. I have only one limit for all languages. Maybe I can change this for frwiki. I will try this. --sk 09:16, 4. Aug. 2009 (CEST)
Ok, Thx... i wasn't sure that you will agree, i thought this could increase the time scan, and then stress server...
 Ok, I have change the limit from 50 to 100 only for frwiki. -- sk 22:26, 18. Aug. 2009 (CEST)
Dieser Abschnitt kann archiviert werden. sk 21:39, 26. Okt. 2011 (CEST)