Szerkesztő:Bináris/Tartalomjegyzékbot/Bot owners' guide

This is the bot owners' guide to TOCbot, a bot written by Bináris that creates table of contents of archives of a given page. For user guide please see this page.

Usage of the bot

Each page together with its archives is described by a config as explained in user guide. Configs have two kinds:

A central page may contain several configs. It may be either your own subpage or a community page. This has been designed to use for community pages as it makes possible to manage a lot of configs en masse easily. You may also use this for a limited number of user archives if you want.
Individual configs are similar to those of archivebot and are designed for user archives. They must be in the introductory part of the main page (mainly a talk page), and it is allowed to have more than one of them on a page. Users may install them individually after you announced the service.

A central page may contain any other text, template, category etc. It will be read line by line. Each config's lines should follow continuously, any empty or alien line will cease reading it. Putting a config among HTML comment marks or nowiki tags makes no sense if they are on a different line than the config itself; if you want to temporarily comment a config out, just spoil the first line of it by changing : to :* or by writing anything in front of :.

This concerns the individual templates, too. You may temporarily switch them off by spoiling the first line.

There is not much technical difference between central page and individual configs except the way of specifying the main page. They differ mainly in their purpose. While community pages will be maintained mostly by bot owners, who may want to keep the configs together, watch their changes and run mass tests on them, user archives are in the scope of their owners. That's why I speak about central pages as places to collect configs of community pages and about individual configs as ones belonging to user archives. But this distinction is not obligatory for you. Another difference is described in section #Supplementary parameters.

Steps to implement the bot

You must have a working Pywikipedia installation and some basic knowledge of Pywikipedia.
Know some working examples; there are some of them in the last row of the above template.
Read the overview and the user guide carefully.
Study collected archive name patterns; there are a plenty of tested solutions with detailed explanation. None of them is an artificial example, they are from live wikis. You may be lucky enough to find a ready config for one of your pages.
Please check out this page before beginning your work. Write your name, wiki and your bot's name into the table, and check if there is anybody from your wiki or from other wiki in your language. Please contact them and discuss localisation (don't use different translations for the same language and don't interfere on the same commuinty pages, there are a lot of them to work on). The last column of the table should be filled in later, if you run a modified version.
Visit User:Bináris/TOCbot/Newsletter. If you frequently come to Hungarian Wikipedia, add it to your watchlist, otherwise you are suggested to subscribe by logging in to Hungarian Wikipedia and adding your signature to User:Bináris/TOCbot/Newsletter/subscribers. My bot will then send you the actual news in e-mail. They will be more frequent in the first times and occasional later.
Download User:Bináris/tocbot.py and User:BinBot/roman.py. The text from between <source> tags (after opening the page for edit) should be saved by the same name into your Pywikipedia directory. Use an UTF-8-aware editor without BOM (so Notepad of Windows will usually not be applicable).
- Tocbot.py should be a part of Pywikipedia framework if accepted after a basic testing period. Saving the first version and the modifications from my subpage is only a temporary inconvenience, and later you will get the actual bot together with your Pywiki update.
- Roman.py will be imported by the bot. It is not subject to change in the near future as it is tested and ready. If you find it uncomfortable to download this way, you may vote for inserting it into the Pywikipedia framework here.
If you are the first to use the bot in your language, a small localisation work follows as described below. I you find the dictionary keys of your language code in the source and their values are correct, you are lucky.
Now you are ready to run the bot. Choose an easy task (e.g. a page that has Archive1, Archive2 etc.), create your first config and test the result. Make sure that the bot has found all the archives and nothing else, sorted them correctly, and has recognized and sorted the dates in your language properly. (This will be seen if you compare the table to the archives.)
If you have localised the code, please let me know the result in order do build it into the official version. The simpliest way is to edit User:Bináris/tocbot.py (only if it is about language defaults!). I watch that page. In order to protect the code, only logged-in edits will be accepted from the account you listed in the above table; all other edits will be reverted. You help other bot owners with this as well as yourself because you can more easily update code. Other option is to wait until the script gets into Pywikipedia framework, and then publish your language versions through Sourceforge bug tracker.
If you use a modified version other than pure language translations, please publish it in your wiki with a link to the original code, and write the link into the bot owners' table.
Create as many configs as you can, run and test them. It is recommended to separate pages according to their traffic as described below – this will decrease your bot's load. Determine the frequency of rebuilding them.
Link the TOCs to their main pages.
Announce the community that TOCs are available and they will be applicable for personal pages at a later time. Tell them how to make this script useable – everyone should take care of undersigning their contributions and give the new sections meaningful titles ("Question" or ":-)" is 100% useless).
You may want to enhance the archive template. This is jut a useful small contribution. Most wikis have a template with this picture that is used to place the archives on the main page. I modified it so as if there is a subpage called Tartalomjegyzék (the default Hungarian name where TOCs are saved to unless otherwise specified in the config), the template will detect it and give a link to it. Look at it on this page or on my talk page. If there is no such subpage, nothing happens.
You may also want to consider translating the user guide for local users. Please link it to the original. Please don't copy the archive name patterns page, only link to it as I may edit it continuously.

Localisation

(To be continued)

Creating a central config page

Create a new page anywhere and begin to write configs. Examples for configs of community pages are c1, c2, c3.

Creating a template for individual configs

These templates are very easy to create, and very similar to templates of archivebot.py. They may be in any namespace and may contain anything. The bot will never read them; their purpose is to be transcluded. The bot will read the transclusions, not the template itself. The only limitation is that it must exist and any content (except an included category) on the template page must be in nowiki. (You don't want to make it appear on the users' talk page.)

My template is User:BinBot/config, and the usage of it may be seen at Special:Whatlinkshere/User:BinBot/config. Look at the introductory part of the pages for living examples.

Announce the location of the template to your community and tell them its name is case sensitive.

Syntactical differences of central page configs and individual configs

On central pages each config begins with a line that defines the page to work on. This line begins with one colon. Individual templates begin with the line that contains the name of the template (something like {{user:YourBot/config). This line is case sensitive! The main page is automatically detected, it is the page where the template is inserted to.

On central pages all other lines containing valid parameters begin with two colons (::), while the comments with :*. This way the page remains readable by humans. Lines of individual templates begin with | (beeing the general separator of templates), while the comments with |*.

Central page configs have no closing lines. Processing will end upon any empty or alien line or the beginning of the next config or end of page. Individual templates must be closed with }} just because they are templates, but the bot is not interested in it. Processing will end upon any empty or alien line or the beginning of the next config template or end of the introductory part of page.

Supplementary parameters: cfam, clang and to

This section completes the same section of user guide.

There are two more secret parameters undocumented in user guide.

cfam sets the wikifamily where the original page and the archives are
clang sets the language of the wiki within cfam

C stands here for config, in order to not confuse these with global parameteres lang, family. Always give clang if you give cfam. clang may be used without cfam.

These parameters allow you to have the config in another wiki than the pages themselves. This possibility was used during the building of the bot to mass test foreign rex examples from the same central page on my own subpage in huwiki, and is not deprecated, because it is useful if you run your bot in several wikis and want to make experiments on one central page in home wiki. Use with -test only!

Use -lang and -family in command line to work in a wiki other than your bot's home wiki, and use cfam and clang if the central page is in your bot's home wiki, but the archive to test is not. Using these will slow down your bot because it will get the month names from live wiki again. Thus it is not designed for end users and is not foolproof. cfam & clang MUST preceed rex & rex2 if MMM or MMMM is in them!

Parameters cfam and clang are allowed only in central page configs. If they are applied in an individual config, the bot will ignore them.

There is one more parameter that behaves differently: to. The improper use of this parameter may cause troubles if users begin to create subpages under an alien page, e.g. the talk page of another user. This parameter is rarely needed. You may freely use it on central pages because these pages may be watched by several good faith users and may be protected if necessary. In template type configs a check will be run, and the bot allows the use of this parameter only if it leads to namespace of the same user. So if the default place for TOCs in your wiki is /Index, and User:Example wants to have it at User talk:Example/TOC or User:Example/Index rather than User talk:Example/Index, they may do it by means of to; however, they are not allowed to place it to Admins' notice board, because the bot would replace the existing notice board with a private index. In this case the bot will ignore the parameter.

I think these limitations are enough for security considerations, but let me know if you find any other security issue.

Command line parameters

Global pywiki parameters

-lang and -family are tested to work well.
-help is not ready, use this page instead.

For the others I am not sure what they will do.

Own parameters

-cp defines the central page where your configs are. There may be several of them, so you may process more central pages at the same time. For example, I set apart my configs to three central pages: one for the most visited community pages that should be processed more frequently, one for those with less traffic and one for the museal pages that have once parameter (not implemented yet). So I may decide to process all of these together or separately.
As usual in pywiki, you may give the page as -cp:<page> at the command prompt or just type -cp and the bot will ask you for the page title.
-tpl defines the template for individual pages (e.g. -tpl:user:BinBot/config). Use it as -tpl:<page> or just type -tpl for making the bot ask for it. This parameter may also be repeated in command line several times.
The name of the template is case sensitive and will be searched in the form you type. That means using -tpl:User:BinBot/config instead of -tpl:user:BinBot/config will list the pages but not find the template on them.
-debug switches the debug of central pages, page generator and archive generator on. It will give you verbose information and is recommended during learning and testing period, and later when there is any problem with rexes. Debug reflects raw and processed rexes (translated to regular expressions), all other config parameters, then lists all subpages of main page and then listed, rex and rex2 archives that match the pattern, together with their short names. They appear in their final sort order. It is ideal for debugging rexes. Use your pause button if necessary.
-coredebug will debug the table generator. This is separated from -debug for easier overview. You will see section titles, table rows and templates that modify the number of contributions (their count will appear as "plus" and "minus").
-test is the same as -debug except that it will not save the TOC. With this switch on, your bot does not modify any page in any wiki, only flushes debug information on screen (not the coredebug, it is separated). For complete test, use -test -coredebug (usually not recommended, you will go mad if you do this too often).

Planned parameters (not implemented yet)

For exclusion of misused pages
For forcing re-processing the "once" configs

Testing

You are kindly asked to test the following:

Page and archive generator:
- reading configs from central pages
- finding and sorting archives of a page
- proper work and documentation of rexes
- recognition of long and short month names
- behaviour of the bot in non-Latin wikis, including those with right-to-left direction
Core part (table generator):
- recognition of ==section== boundaries (the most important thing)
- recognition and proper sorting of dates within a section
- recognition of templates in a section (such as "solved" etc.)
- composition and sortability of table
- composition of section titles in the table
- recognition and sorting of old dates given as olddatepattern parameter in config (this is relevant in early archives until 2004)
- behaviour of the bot in non-Latin wikis, including those with right-to-left direction
Saving:
- recognition and behaviour of supplementary parameters
- saving the table with templates, categories, title and target page given in config
Usability of user guide and bot owners' guide

While most of the archive names is quite regular (such as Archive01), please try to find as exotic systems as you can and test them. There are some interesting examples here.

You may use -test, -debug and -coredebug. One more testing possibility is to uncomment pywikibot.output(text) at the end of main() function. The bot will write the ready wikitable on screen with this. This is not included in coredebug because it may be very slow.

If the bot raises CodingError, please let me know immediately with all the details you can share with me. This is a security option in the clauses where the program should never reach.

In my opinion, the bot must handle now properly

non-existing central pages
non-existing main page (if the archives still exist as subpages of a "red" page)
non-existing or redirecting subpage given in the list parameter of the config (redirecting rex pages will not be searched at all)
some syntax errors of a rex: illegal # character (it is prohibited in page titles and has a special role in translating rexes to regexes) and repeated patterns of the same group
an empty table (if no archives are found or there is not any second level == section title == in the archives, the result will not be saved)

The bot tries to create a link in each row of the table that will lead to the original section. It should handle properly bolded and italic titles, wikilinks and external links. It will not handle templates, HTML tags and HTML-like wikicode tags in section titles. This is not a bug, this is a limitation. People are very creative when they begin a new section and I will not rewrite the whole parser. These titles will not be clickable; users should click to a neighbour one and scroll up or down a section.

Outside of Wikipedia

This script should work properly in any MediaWiki site that you can edit with your bot. Let me know if there is a problem. Archive generator has been tested for some Wikia examples as shown on the sample page.