Szerkesztő:Bináris/Tartalomjegyzékbot/User guide

This is the user guide to TOCbot, a bot written by Bináris that will create table of contents from the archives of a given page. TOCbot will be run by local bot owners, who will announce the details for your wiki.

TOCbot is controlled through configs. They have two types:

  1. Those placed on a common page called central page. They make it possible to describe community pages or to run config tests en masse. This kind of configs is planned for use of bot owners.
  2. Individual configs that are inserted into the page to be processed as templates (in the way as templates of archivebot). This is intended to be used by other users.

The two types will slightly differ, namely in the first line. This guide explains the use of both.

Archives that are not handled by the bot szerkesztés

  • Archives that are not subpages of any page (for example, in Esperanto Wikipedia I found some archives directly in Wikipedia namespace, but they already corrected it). Quantity of such extraordinary solutions is so few that it is not worth to deal with. Move them to subpages.
  • Pseudo-archives created by difflinks (Example)
  • Pages that redirect to archives but are not archives themselves (that means you have to update the config if you rename the archives)
  • The bot has nothing to do with Liquid Threads, of course. LQT pages are not considered as archives.

How to make a config szerkesztés

Generally, a config will describe your archives and tell the bot how to find and sort them. This page explains the role of the config parameters.

Central pages szerkesztés

A central page may contain several configs. (Here is an example, and some more below.) Configs consist of three kinds of lines:

  • A line beginning with : followed by the page title. Each config must contain exactly one of this at the very beginning. Do not put the title in brackets!
  • One or more lines beginning with ::. These lines contain the config parameters in the form of parameter = value. (Leading and trailing spaces will be removed, and parameters are case sensitive.)
  • One or more or less lines beginning with :*. These lines may contain comments and will not be processed.

Any other line (including an empty line) means the end of the config. The bot will seek the next line beginning with a single : and throw away (and possibly crumple and trample) anything that it finds in the way.

Individual configs szerkesztés

An individual config may be placed anywhere in the introductory part of the page (that means, before the first header line). There may be two or three of them (this is rarely necessary, if you changed your naming pattern too many times, and should be avoided if possible). The config may either preceed or follow other contents in the introduction such as text, listing of archives, transcluded subpages, templates or the config template of archivebot. Its construction is similar to that of central page config, with the following differences:

  • The first line of the config begins the template and contains only the name of it exactly in the form as the bot owner annonunced. Note that it is case sensitive!
  • Lines containing the parameters begin with |.
  • Comments begin with |*. Spaces are allowed to increase readability.
  • The template must be closed with }} on a separate line.

A simple example:

{{user:ExampleBot/config
 | rex = Archive [nn]
 | cat = MyCategory
 |* This works if you have /Archive 1 to /Archive 37 and places your TOC int the given category
}}

What is a config for? szerkesztés

The main purpose of a config is triple:

  1. to find your archives that are preferably subpages of the main page (with the exceptions described below) and sort them out from the other subpages,
  2. to sort them, usually in chronological order, of course,
  3. to supply them with a short name that will be used in the appropriate column of the table.

I will call these parameters determining parameters. There may be also supplementary parameters that will tell the bot where to put the table, what header and footer and title to put in it, or that the page is museal and thus is unnecessary to process each time the bot runs.

Parameters described here are the default ones with English names. (My first thought was to teach everybody to Hungarian but then I changed my mind. Lucky you.) Bot owners are able to localize the names and announce them to the community, so as users may use them in their native language.

People are very creative in naming their archives. So I had to be very creative to recognize them.   TOCbot is really versatile and handles a wide variety of naming systems, including various date formats, Arabic/Roman/binary/hexadecimal numbers and letters as well.

A few words about page titles szerkesztés

Title of the page, either given in the first line of a config or in the from parmeter, as described below, should follow the canonical namespaces of your wiki. For example my talk page can be given as Szerkesztővita:Bináris, user vita:Bináris or user talk:Bináris. Whatever of these you click on, they will take you to my talk page. But only the first version is supported by the bot. This is because the bot gets the subpage names from the site's Special:Prefixindex, and tries to match the strings composed from your config against them.

Determining parameters szerkesztés

On central pages:

::from = 
::list =
::rex =
::rex2 =

In individual configs (most users will need this form):

|from = 
|list =
|rex =
|rex2 =

The bot needs the title of the page the archives of which are to be listed. For central page configs it is in the line beginning with : (as described above), for individual configs it is the page where the template is inserted. I will call this main page (not to be confused with the main page of a wiki). Unless otherwise specified, each archive is supposed to be a subpage (or sub-subpage etc.) of this. You may modify this behaviour by the parameter from (without brackets) that tells the bot to look for the archives under another page. From is a fully qualified (absolute) title. All the names should be relative to this page. For example, some users put their talk archives under their user page rather than the talk page. Or some older community pages are archived under another page when noticeboards or village pumps are reorganized. The most typical use of this parameter is when the archives are under an Archive subpage instead of being directly under the main page. However, with the use of this parameter it is still true that all the archives are under the same page (allowing that some of them are direct subpages, while others sub-subpages of it). If you have a part of your talk archives under your talk page, and the rest of them under your user page, you should either have two different configs and get two separate tables as result, or move either of them so as to get them together.

Now, after you have told the bot where your archives are, it wants to know what they are. The simpliest way is to tell the bot a rule:

My archive names begin with Archive followed by a space and a number from 1 and the newest is 17, so two digits will be enough for quite a while.

Or:

My archives are grouped by two-digit years, each year a subpage of the talk page, and the archives are subpages of the year, named by the long month name. E.g. Mytalk/08/January etc.

Or:

This community page has archives marked with eight digits as subpages of Archive: the month, the day and the year in this order, like Archive/08232010.

Expressions formalizing these rules are called rexes. Those who are familiar with regular expressions will surely recognize some similarity; these rexes are small templates that will be translated to real regular expressions by the bot. The construction of rexes is detailed in the next section. The above mentioned examples would be transformed to rexes like this, respectively:

::rex = Archive [nn]
::rex = [YY]/[MMMM]
::rex = Archive/[MM][DD][YYYY]

For the third one you have an alternative choice:

::from = <name of the main page>/Archive
::rex = [MM][DD][YYYY]

Use | instead of :: in individual configs.

Naming methods are often changed when people realize that there is a more comfortable way or when they begin to use the archivebot. That's why I made it possible to use two of this expression: rex for the older archives and rex2 for the newer if necessary. But sometimes there are archives that don't fit into the line, usually the oldest ones; you may use the list parameter. With this you have just to enumerate the subpages, separated by the | sign (a comma would not be good because it is a legal character in the titles, so it may perhaps cause a conflict).

At least one of list, rex and rex2 should be used, otherwise there is no point in running the bot. Combined use of these three will generally be enough to describe the archives of one page, except totally confused and irregular naming unsystem. In most cases one simple rex is enough.

Some fast and easy examples: A simple rex with serial numbersAnother simple numberingCombined use of from, list and rexRex & rex2From with list & rex or rex & rex2List, rex & rex2 togetherList with rex

Parts of a rex szerkesztés

Rex may contain any constant text plus variable parts. Variable parts are among single square brackets, and there may be at most five of them: one for year, one for month, one for day, one for a numbering part (marked with number or letter), and an optional part. Again, they are case sensitive. Limitations of the values are to increase accuracy of the pattern.

Year
  • [YYYY] for a 4-digit year number. It has to begin with 20. Don't want me believe you created your first archive in Wikipedia back in 1998.
  • [YY] for a 2-digit year number. Leading zero is mandatory. (09 and 2009 are valid years, while 9 is not.)
Month
  • [MMMM] for the long name of the month, according to the settings of your wiki. This means it is in your native language; in English wikis the bot will look for January/February etc., in Hungarian január/február etc. For Greek language Wikimedia has the default Ιανουάριος/Φεβρουάριος etc., and your wiki will have these names if they are unchanged, while in Greek Wikipedia they have overwritten the default to genitive form, so [MMMM] means Ιανουαρίου/Φεβρουαρίου etc. These Greek examples show the difference.
  • [MMM] for the local short month name, as above (Jan/Feb etc. in English)
  • An ! (exclamation mark) after MMMM or MMM will remove any dot or comma from the end of month name, if there is one. For example, German and Greek Wikipedias put a dot (.) after some (but not all) of the short month names, and Latvian Wikipedia puts a comma (,) after both the short and the long month names. See an example from dewiki.
  • Additionally, you may put a letter u after MMMM, MMMM!, MMM or MMM! for upper case and l for lower case if you want to force the bot use the other case than default (for example [MMMl] for jan/feb instead of Jan/Feb). If you click to the above link, you will find three examples for the use of [MMMMu]. Ctrl F "Poco" for a Spanish example and "ru:Википедия:Форум/Правила" for a Russian one, and there is one more from nowiki.
  • [MM] for a fixed width two-digit month number, 01 to 12
  • [M] for a month number without leading zero, 1 to 12
Day
  • [DD] for a fixed width two-digit day number, 01 to 31
  • [D] for a day number without leading zero, 1 to 31
    Note that though these may appear just numbers, they have special role; see the next two sections.
Numbering

One of these options may be used; use of more than one will result in an error message, and your archives will not be processed.

  • [N], [NN], [NNN]… for a fixed width number (NN for 2 digits). Use this if the number of digits is really constant, it is safer. (For example, 01, 02 … 29.)
  • [nn], [nnn] etc. for numbers without leading zeros; use this if your numbers are from 1 to 15 rather than 01 to 15. Note that [n] is not allowed, for it is senseless. Write as many n's as the possible maximum. If your numbers go 01 through 95, and 100 is soon expectable, write [nnn].
  • [H], [HH] etc. for fixed width hexadecimal numbers (with upper case letters)
  • [hh], [hhh]… for variable width hexadecimal numbers, also upper case, as described at [nn]
  • [BBBB] etc. for fixed width binary numbers. This is unique, because long binary numbers may contain spaces, too. This is for my own sake; as my name is Bináris (which means Binary), my archives are numbered with 0000 0001, 0000 0010, 0000 0011 and so on. My rex is [BBBB BBBB].
  • [bbbb] (four b's is only a typical example) for variable width binary numbers, as above. These may be combined, but currently only the leftmost group may be unleaded by zeros, so [bbbb BBBB BBBB] is valid, while [bbbb bbbb] is not.
  • [L], [LL], [LLL]… for fixed width upper case English letters. For example, [LL] means AA…ZZ, and [L] from A to Z.
  • [l], [ll], [lll]… for fixed width lower case English letters, for example [l] for a to z.
    Don't use L and l for any text. If you have constant text, type it. This option is for a variable part, which is in fact numbering, but rather with letters than with numbers. Variable width is not supported, because it causes too much conflicts and is not worth to deal with.
  • [R] (single R only!) for a Roman number. Only regular upper case Roman numbers are supported, that means 49 has to be XLIX, and IL will not be recognized as 49, nor IIII as 4. You may not specify the number of letters. The bot will look for letters I, V, X, L, C, D, M as long as they are, and then try to interpret the result as a Roman number. If there is any of these letters directly before or after the Roman number, recognition will fail. You will find some examples by Ctrl F-ing "[R]".
Optional part
  • [*] is the well-known joker and will match any sequence of characters, including nothing; they will be sorted according to default (I think, this is the Unicode order). It may stand either at the end of the rex or inside. Use Ctrl F to find examples, there is a lot of them.

Sorting order szerkesztés

When creating the table of contents, the bot always takes the listed archives first, just in the order you wrote them. Archives matching rex follow them, and at the end come the archives matching rex2.

Within rex or rex2 the sorting precedence is, regardless of the order of the parts in the expression:

  1. year
  2. month
  3. day
  4. any number or letter
  5. * (the optional part)

Short name szerkesztés

Each archive page will get a short name that will indicate the page in the table of contents. The shorter it is, the nicer will be the table. Short names are composed as follows:

  • For listed archives the short name will be the page title just as you listed it, without leading and trailing spaces. For example, if the archives are under Page/Archive/, and you specify them as
    ::list = Archive/myfirstarchive | Archive/myotherarchive
    , the "short" names will be as long as they stand here. But if you use
  ::from = Page/Archive
  ::list = myfirstarchive|myotherarchive
instead, they will shorten to the page names themselves. (Use | instead of :: in individual configs.)
  • For rexed archives, the variable parts will be concatenated in their original order. For example, if your archive name is Archive/2011 January, and the rex is Archive/[YYYY] [MMMM], the short name will be 2011January.
  • In some rare cases the bot finds no variable parts in the name. The short name will then be a - sign (hyphen). This may happen if you use a constant name as rex or the only variable element of the rex is [*] and it matches nothing. For example, if the archives are archive, archive1 and archive2 and the rex is archive[*], the names will be -, 1 and 2, respectively.

Considerations szerkesztés

Sometimes it seems that two solutions are equivalent, and sometimes they really are. But there may be fake equivalents, too.

  • Take care of the sorting order. One of the above examples is:
    ::rex = [YY]/[MMMM]
    (With | instead of :: in individual configs.)
    You may think that [nn] or [NN] or [*] would do instead of [YY]; any of them selects the two digits and sorts them among themselves, but month has higher precedence then any of these, so the final order will be 08/January, 09/January … 11/January, 07/February … 11/February, and so on.
  • Rex and rex2 or list and rex should not overlap. If there is any archive that fits into two or three of these patterns, it will be processed repeatedly, giving a false result.
  • There are cases when two solutions are really equivalent, but result in different short names. You may want to decide which of them should appear in your table. Ctrl F the phrase "short name" on this page for examples.
  • Avoid conflicts between the parts of the name and between archives and other subpages, especially when using [*] which means any character. E.g. at this example Tea room of English Wiktionary has subpages Archive 2003 to Archive 2010; however, while pages Archive 2003 to Archive 2006 are yearly archives, the others from Archive 2007 are not, so Archive [YYYY] is a wrong choice. Or, suppose, you have subpages Mytalk/2009, Mytalk/2010, Mytalk/2011, Mytalk/9999, three of them being yearly archives and the last one the talk page of your 9999 subpage which lists the 9999 most important songs of the Universe. [NNNN] or [*] would list all of them, but [YYYY] selects the year numbers because it is restricted to numbers between 2000 and 2099. [nn] may be any number between 0 and 99, while [dd] between 1 and 31. If you have subpages Mytalk/May 2009, Mytalk/May 2010, Mytalk/May 2011, [MMM] [YYYY] and [MMM][*] seem to be almost equivalent (except a small difference in short names) until you have Mytalk/May I have a question?, in which case the latter expression will list this one, too, while the first behaves normally.   This Wikia example shows an interesting case of conflict.

If you are in doubt, you may prepare both variants of the config and ask the bot owner to run a test on them.

Supplementary parameters szerkesztés

These are not necessary to create the index, but may enhance the usability of it. Note that these are also case sensitive!

On central pages:

::title = 
::to =
::template =
::cat =
::once =
::olddateformat =

In individual configs (most users will need this form):

|title = 
|to =
|template =
|cat =
|once =
|olddateformat =
title
The bot gives a default section title to all the TOCs it creates. You may modify this here. The title will appear as you define here (wikilinks allowed). Don't insert the = signs.
to
TOCs have a default location relative to the main page, for example /Index. This is defined by the bot owner (or the community). You may modify this location here. to is a fully qualified (absolute) title. Don't use brackets! E.g. you are User:Example, and you would like to place the TOC of your talk archives to User:Example/TOC rather than User talk:Example/Index. Write here User:Example/TOC. In individual templates the new location must be a subpage or sub-subpage etc. of your user page or your talk page. Parameters directing to any other location will be disregarded. Trying to put your page to any other location may qualify as vandalism depending on the tolerance of your community.
This parameter is unavoidable if you have to use two individual configs on the same page (e.g. your archives follow four separate patterns, and there is too many of them to use a list). In this case two separate TOCs will be generated, and they must have separate locations (otherwise the second one would overwrite the first one).
template
Relax, this is the last one beginning with 't'. The given string (either a real template or one of your subpages) will be inserted above the table among {{ }} signs (don't include them!). You may put templates on top of your page this way. Parameters are allowed as template|param1|param2. Should you want more templates, use this parameter repeatedly.
cat
This will insert the given category at the bottom of the page. May also be used repeatedly. Don't insert "Category", just the name of it. A sorting key is allowed in the form of categoryname|key (but must not begin with space, use * instead to sort in the beginning of the category).
once
At a later stage a True value will mark museal pages whose archives are no more updated and should be processed only if explicitely told. (Not implemented yet.)
olddateformat
At a later stage a regular expression may be placed here to describe the ancient datestamp formats used back in 2004 and earlier. Most of the users will never need this. (Not implemented yet. The bot can handle these old archives without this, only dates will not be recognized properly. However, you may already see here an example that matches three types of old timestamps.)
*
Place here any comment, but don't put space before the asterisk. The syntax is |* in individual configs and :* on central pages.
Example
This version of my own TOC has three templates (two of them with parameters) and two categories (one of them with a sorting key). The appropriate config template in the introductory part of my talk page is as follows (sablon stands for template and kat for cat, because they are localized parameters):
{{user:BinBot/config
 |rex = Archív/[BBBB BBBB]
 |* Szerintem ez egy komment
 |sablon = ambox|type=notice|text=Próba: így lehet beilleszteni egy sablont két paraméterrel
 |sablon = SN
 |sablon = bedolgoz|Kuka
 |kat = TOCbot|*
 |kat = Személyes tartalomjegyzékek
}}

Newsletter szerkesztés

TOCbot has a newsletter on Szerkesztő:Bináris/TOCbot/Newsletter. New features and other changes will be announced here, in the first times more often and later occasionally. You may add it to your watchlist if you visit Hungarian Wikipedia regularly, or subscribe to it on Szerkesztő:Bináris/TOCbot/Newsletter/subscribers. In this case my bot will send you the new sections in e-mail.

Troubleshooting szerkesztés

  • Does your config contain only the three types of lines described in #How to make a config section? Any other line, including an empty one, or any misspelled parameter name (remember they are case sensitive!) will cease reading the config.
  • Are the archive titles relative to the main page?
  • If there is a from parameter, is it absolute? This is the only one that should not be relative to the main page.
  • Are the rexes correct? Try to find similar examples on the example page. Read the section #Considerations again.
  • Are you looking your table where it is? An error in parameter to may put it somewhere else. Once the bot created your TOC for the first time, add it to your watchlist.
  • Are there section titles in your archives? If no, the bot has nothing to do with them. As the TOC of a book needs chapter titles, for the use of TOCbot second-level section titles (such as == Title ==) are required. Parts before the first title (typically a welcome message) won't be processed; if an archive has no section titles, it will not be processed at all, and finally, if none of your archives has a section title, you would get an empty table which the bot will not save.
  • Ask the bot owner to run a test. It will list the archives without creating the table and is very efficient in troubleshooting. If you are the bot owner yourself, run the bot with -test or -debug. If you are not, please don't forget that bot owners are volunteers themselves as you are and they probably have a list of similar demands beside their own work, and sometimes your problem is not their most important problem. Explain the matter precisely and be patient.
  • Make sure your computer is properly plugged in. Lack of electricity may prevent you of seeing the table of contents.
  • Was any of your pages renamed? You should then update the config, since the bot does not handle redirects.
  • Ask the bot owner if he/she has excluded your page for any reason. They have the possibility to do so if your config causes trouble or if they think you misuse their bot.
  • Naturally, any changes made on your config will take effect only at the next run of the bot, not immediately.