Ez a bot kigyűjti a lehetséges évfordulókat a kezdőlapi sablon elkészítéséhez.
This bot gathers possible anniversaries to help maintain the main page. You are welcome to implement it into your Wikpedia.

Pillanatnyilag Atobot futtatja. A kód: anniversary.py nextmonth5 -noskip

Dokumentáció / Documentation

"""
This bot tries to gather relevant articles for the anniversaries on main page.

In Hungarian Wikipedia anniversaries are grouped by 5 years, so we have to
search for years with ending 0 or 5, 1 or 6 etc. But you may search for every
year as well.
(C)(D)(E)(F)(G)(A)(H)(C) Bináris, ato, Tacsipacsi, 2014–2023

KNOWN PARAMETERS
nextmonth:  takes the next month from calendar, and goes through every day
            ("every year" mode)
nextmonth5: as nextmonth, but searches for round anniversaries
            (elapsed time should be divisible by 5)
In lack of parameters the bot will work with data wired in main().
-noskip:    forces the bot to reprocess existing target pages (see above).

HELP FOR LOCALIZATION
This script is developed for Hungarian Wikipedia.
If you want to port this script to another Wikipedia, you should modify:
global stopsections
global infobox (if there is no such word, your task may be hard, eat chocolate)
global basepage -- where to save
global monthnames
global header
DailyBot.yearregex, dateregex, dateregexwithyear (__init__)
DailyBot.date_composer()
DailyBot.categories()
DailyBot.create_page()
This requires a basic knowledge of regular expressions.
Remove the line "from binbotutils import levelez" below and any lines
containing "levelez", "hiba" or "fatal" (these serve my own needs).
When publishing the modified script, don't remove this help text and link
your script to hu:user:BinBot/anniversary.py as original.
See also [[hu:User:BinBot/anniversary.py/doc]].

Structure:
  DailyBot processes one given day (with given year endings, if applicable).
  CallBot validates the parameters for one day and calls DailyBot.
  Any further frame may be written to call CallBot in loop.
  There are some at the end for sample.
"""
'''
TODO
    Removal of cite* templates from main text
    Extraction of birth and seath dates from introduction
    Some more wrappers and commandline parameters
    Import regexes from textlib rather than copying
    PEP 3107
    Get local monthnames automatically
'''

import re
import datetime
import locale
import pywikibot
from pywikibot import pagegenerators
locale.setlocale(locale.LC_ALL, '')

# from binbotutils import levelez
# Remove if you are not me

# List of usual section titles at the end of the article such as sources, where
# the bot will stop searching. Dates beyond this point are irrelevant.
# The bot will skip the trailng part from the first match of these.
stopsections = [
    'Külső hivatkozások',
    'Források',
    'Jegyzetek',
    'Lásd még',
    'Kapcsolódó szócikkek',
    'További információk',
]
stopsectionregex = re.compile(fr'== *({"|".join(stopsections)}) *==')
# A word that occurs in the name of infobox templates:
infobox = 'infobox'
# A page under which the subpages will be saved:
basepage = 'Wikipédia:Évfordulók kincsestára'
# Month names in your wiki (see also DailyBot.date_composer()):
monthnames = [
    'Január', 'Február', 'Március', 'április', 'Május', 'Június',
    'Július', 'Augusztus', 'Szeptember', 'Október', 'November', 'December'
]
# Excepted articles (a list of compiled regexes)
# Will be searched, not matched! So mark ^ and $ if neccessary.
exceptions = [
    # Except month names and dates
    re.compile(fr'(?i)^({"|".join(monthnames)})' + r'( \d{1,2}\.)?$'), # ?i az április miatt
    # Except years
    re.compile(r'^(I. e. )?\d+$'),
    re.compile(r' (keresztnév)$'),
    # Except lists
    re.compile(r'listája'),
    # Irrelevant sport articles
    re.compile(r'(szezonja|bajnokság|labdarúgó ?kupa|kupája|válogatott)'),
    re.compile(r'(átigazolás|játékokon)'),
    # Irrelevant TV series
    re.compile(r'televíziós sorozat'),
]
# A header for the result pages
header = '''<!-- A lapot bot frissíti. Ha változtatás szükséges, jelezd Binárisnak! -->
{{tudnivalók eleje}}
Ez a lap a napi évfordulósablon elkészítéséhez nyújt segítséget. <br />
Itt van egy leírás, hogyan készült: [[Szerkesztő:BinBot/anniversary.py/doc]].
{{tudnivalók vége}}
Utolsó módosítás: ~~~~~

'''
# --*-- End of global stuff to be localized. Go to DailyBot to continue. --*--
# Section titles (second level is enough for us):
sectionpattern = re.compile(r'^==[^=].*?==')
references =  re.compile(r'(?ism)<ref[ >].*?</ref>')
HTMLcomments = re.compile(r'(?s)<!--.*?-->')
TEMP_REGEX = re.compile(
    '(?sm){{(?:msg:)?(?P<name>[^{\|]+?)(?:\|(?P<params>[^{]+?(?:{[^{]+?}[^{]*?)?))?}}')
# The above 3 regexes are copied from textlib.py. TEMP_REGEX is far not perfect
# for nested templates but currently the best known effort.
days = [31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

def today():
    return datetime.datetime.today()

class CallBot:
    """
    Validate the parameters and call DailyBot if possible.

    It's not foolsafe. If you call it with logical values, blame yourself.
    For parameters see DailyBot.
    """

    def __init__(self, month, day, yearmodulo5=None, overwrite=False):
        self.month = month
        self.day = day
        self.year5 = yearmodulo5
        self.ow = overwrite # No need to validate.

    def validate(self):
        if not self.month:
            return
        if not self.day:
            return
        if self.month < 1 or self.month > 12:
            return
        if self.year5 is not None:
            if self.year5 < 0 or self.year5 > 4:
                return
        if self.day < 1:
            return
        if self.day > days[self.month-1]:
            return
        # Passed
        return 1

    def run(self):
        if self.validate():
            d_bot = DailyBot(self.month, self.day, self.year5, self.ow)
            d_bot.run()
        else:
            pywikibot.output('Invalid parameters') # May be detailed later.
            return

class DailyBot:
    """
    Search articles for a given day.

    Call it through CallBot.
    Parameters:
    month, day: a day of year as integers (e.g. 12 and 26); compulsory
    yearmodulo5: integer between 0 and 4, see the preface; optional
    If given, the bot will search for years that have the given remainder
    modulo5 (e.g. yearmodulo5=3 => we search for years with ending 3 and 8
    If None, all the years are valid results.
    overwrite: if True, existence of the result page won't be checked,
    rather the target will be ruined and built again (defaults to False).
    """

    def __init__(self, month, day, yearmodulo5=None, overwrite=False):
        self.month = month
        self.day = day
        self.year5 = yearmodulo5
        self.overwrite = overwrite
        self.site = pywikibot.Site()
        # This dictionary will contain the roles where the date is found.
        # Currently birth, death, infobox and other, and articles of years.
        # Each list contains dictionaries with 'page', 'year' and 'text'.
        # 'year' is the sortkey and is not directly output.
        self.data = {
            'births': [],
            'deaths': [],
            'infobox': [],
            'other': [],
            'years': [],
        }
        # A regex for the titles of articles about years. I don't bother years
        # b. C. because anniversaries would be confused anyhow.
        # Listing b. C. years will seriously slow the program down!
        # As in most wikis year articles have the title in the form of a simple
        # number, and number articles have some addition, usually you
        # don't have to modify this.
        if self.year5 is None:
            # If you want to search anniversaries in every year:
            self.yearregex = r'^[1-9]\d*$'
        elif self.year5 == 0: # 0 is not a year
            # Will search for years with the appropriate ending modulo 5:
            self.yearregex = r'^(\d+0|\d*5)$'
        else:
            self.yearregex = fr'^\d*[{self.year5}{self.year5 + 5}]$'
        # Create a regex which shows how the dates are written in your wiki.
        # Think of various linking possibilities!
        # This one does not contain year; may be used in the articles of years.
        if self.day < 10:
            r = fr'0?{self.day}'
        else:
            r = fr'{self.day}'
        m = monthnames[self.month-1]
        # ?P<year> identifies the year part (this is the sortkey for results)
        # ?P<date> identifies the date part w/o year (currently not used AFAIK)
        self.dateregex = re.compile(
            fr'(?i)(\[\[)?(?P<date>{m} *{r}\.?)(\|.*?)?(\]\])?(?!\d)')
        # And this one with years (I don't treat 0 separately here)
        # I use the bot with year5 only, so I may use it in regex,
        # but if you use it without year5, don't use it here either!
        self.dateregexwithyear = re.compile(
           fr'(?i)(\[\[)?(?P<year>\d*[{self.year5}{self.year5 + 5}])'
           fr'(\]\])?\.? *(\[\[)(?P<date>{m} *{r}\.?)(\|.*?)?(\]\])(?!\d)'
        )
        # For localization modify the page name.
        # Page will be created with the title basepage/daytitle.
        self.daytitle = f"/’{self.year5} és ’{self.year5 + 5}"
        self.daytitle += f"/{self.month:02d}-{self.day:02d}"
        self.targetpage = basepage + self.daytitle # Where to save the result

    def date_composer(self, month, day):
        """
        Generate the title of the date articles as in your wiki.

        Subject to localization.
        """
        return f'{monthnames[month-1]} {day}.'

    def list(self, month, day):
        """Return a page generator for the articles linking to the date."""

        daypage = pywikibot.Page(self.site, self.date_composer(month, day))
        return pagegenerators.RegexFilterPageGenerator(
            pagegenerators.NamespaceFilterPageGenerator(
                daypage.getReferences(follow_redirects=False), 0, self.site),
            exceptions, quantifier=True)

    def year_list(self):
        """Return a page generator for the articles of years."""
        return pagegenerators.RegexFilterPageGenerator(
                self.site.allpages(
                    start='1', filterredir=False), [self.yearregex])
        # Will have to break after '999' or else you will be old when it ends.

    def parse3(self, text):
        """
        Divide the page text to 3 parts.

        A: text before the first == level section title
        B: text from the first section title to the first stopsection
        C: text from the first stopsection to the end.
        A and B will be returned as a tuple and poor C will be thrown away.
        """
        lines = text.splitlines(1)
        comeon = True
        where = 0
        linenum = 0
        tx1 = tx2 = ''
        while comeon and linenum < len(lines):
            line = lines[linenum]
            if sectionpattern.match(line) \
                    and not pywikibot.textlib.isDisabled(text, where):
                tx2 = line
                comeon = False
            else:
                tx1 += line
            where += len(line)
            linenum += 1
        comeon = True
        while comeon and linenum < len(lines):
            line = lines[linenum]
            if stopsectionregex.match(line) \
                    and not pywikibot.textlib.isDisabled(text, where):
                comeon = False
            else:
                tx2 += line
            where += len(line)
            linenum += 1
        return (tx1, tx2)

    def template_processor(self, page, introtext):
        """
        Treat the templates in the top section.

        Infoboxes in the top section will be processed and removed.
        Other templates will just be removed (most often these are amboxes).
        """
        templates = pywikibot.textlib.extract_templates_and_params(introtext)
        # See https://doc.wikimedia.org/pywikibot/master/api_ref/textlib.html#textlib.extract_templates_and_params
        # for return values
        needs_birthdate = True
        needs_deathdate = True
        for t in templates:
            if re.search(infobox, t[0]):
                # print(t) # debug only
                for k in t[1].keys():
                    # pywikibot.output(k) # debug only
                    # pywikibot.output(t[1][k])
                    # Workaround for a specific huwiki duplication
                    if k.lower().strip() == 'alsablon':
                        continue
                      
                    # If there is a birth or death date in the infobox,
                    # but not the desired date, we don't have to search for it
                    # in Wikidata, which is expensive.
                    if re.search(r'születési? ?dátuma?', k):
                        needs_birthdate = False
                    elif re.search(r'halál(ozási?)? dátuma?', k):
                        needs_deathdate = False

                    m = self.dateregexwithyear.search(t[1][k])
                    if m:
                        # We have just found the date we are looking for :-)
                        d = {
                            'page': page,
                            'year': int(m.group('year')),
                            'text': k + ' = ' + m.group()
                        }
                        if re.search(r'születési? ?dátuma?', k):
                            self.data['births'].append(d)
                        elif re.search(r'halál(ozási?)? dátuma?', k):
                            self.data['deaths'].append(d)
                        else:
                            self.data['infobox'].append(d)
                        # pywikibot.output('<<green>>Bingó! ' + d['text'] + '<<default>>')
        if needs_birthdate or needs_deathdate:
            self.search_birth_death_date(
                page, needs_birthdate, needs_deathdate)
        # Removal (must be repeated for nested templates):
        while TEMP_REGEX.search(introtext):
            introtext = TEMP_REGEX.sub('',introtext)
        return introtext

    def birth_death(self, page, introtext):
        """Try to find birth and death dates in introduction."""
        # But not yet
        pass

    def search_birth_death_date(self, page, needs_birthdate, needs_deathdate):
        """
        If the page is about a person, try to find dates in Wikidata.
        
        Only if the dates were not found in the infobox.
        Some infoboxes do not contain birth and death dates explicitely,
        rather display them with some Lua magic. In this case we cannot extract
        them from page text, so we have to search for them directly.
        """

        def treat(property, section):
            claim = page.get_best_claim(property)
            try:
                _date = claim.getTarget().toTimestamp()
            except (AttributeError, ValueError): #VE fr BC years
                return
            if _date.month == self.month and _date.day == self.day:
                year = _date.year
                if re.match(self.yearregex, str(year)):
                    text = ' [[d:' + page.data_item().title()
                    text += '|A Wikidata szerint]] ' + str(year)
                    d = {
                        'page': page,
                        'year': year,
                        'text': text,
                    }
                    self.data[section].append(d)
                    # pywikibot.output('<<green>>Bingó! ' + d['text'] + '<<default>>')

        if needs_birthdate:
            treat('P569', 'births')
        if needs_deathdate:
            treat('P570', 'deaths')

    def other_dates(self, page, text):
        # First we remove any cite* templates (öööö erghh later someday)

        for m in self.dateregexwithyear.finditer(text):
            # print m.group()
            minus = min(m.start(), 60)
            plus = min(len(text) - m.end(), 60)
            show = text[m.start()-minus : m.end()+plus].replace('\n', ' ')
            show = show.replace('nowiki>', '') # Just in case
            show = "''<nowiki>" + show + "</nowiki>''"
            d = {
                'page': page,
                'year': int(m.group('year')),
                'text': show
            }
            self.data['other'].append(d)
            # pywikibot.output('<<green>>Bingó! ' + d['text'] + '<<default>>')

    def year_process(self, page, text):
        """
        Process an article of year number.

        Sometimes dates are not directly in the line of the event, rather
        in one of the previous lines like this:
        * [[január 1.]]
        ** some event 1
        ** some event 2
        """
        year = int(page.title()) # If this throws error, method is called wrong
        search_for_double_asterisk = False
        section = 'years'
        for line in text.splitlines():
            search_for_double_asterisk = \
                search_for_double_asterisk and line.startswith('**')
            # Determine if the lines below represent births or deaths
            sect = re.fullmatch('== *\[*(.*?) *== *', line)
            if sect:
                sect_title = sect.group(1)
                if sect_title.startswith('Születések'):
                    section = 'births'
                elif sect_title.startswith('Halálozások'):
                    section = 'deaths'
                else:
                    section = 'years'
                continue
            m = self.dateregex.search(line)
            process = False
            if m:
                # We have just found the date we are looking for :-)
                if line.startswith('*'):
                    line = line[1:].strip()
                    if self.dateregex.fullmatch(line):
                        search_for_double_asterisk = True
                        continue
                process = True
            else:
                process = search_for_double_asterisk
            if line.startswith('**'):
                line = line[2:].strip()
            if process:
                show = line.replace('nowiki>', '') # Just in case
                show = "''<nowiki>" + show + "</nowiki>''"
                d = {
                    'page': page,
                    'year': year,
                    'text': show,
                }
                self.data[section].append(d)
                # pywikibot.output('<<green>>Bingó! ' + d['text'] + '<<default>>')

    def categories(self):
        """
        Return a string with the categories of the result page.

        This string will be copied to the bottom of the page.
        You may write here anything else you want to see at the bottom or
        return '' for leaving it empty.
        """
        footer = f"\n[[Kategória:Évfordulók adatbázisa (’{self.year5} "
        footer += f"és ’{self.year5 + 5})]]\n"
        footer += f"[[Kategória:Évfordulók adatbázisa "
        footer += f"({monthnames[self.month-1].lower()})]]\n"
        return footer

    def process(self, page, mode=0):
        # mode is 0 for ordinary articles and 1 for years.
        try:
            text = page.get()
        except pywikibot.NoPage:
            return # Bot runs slowly, we cannot exclude a deletion meanwhile.
        except pywikibot.IsRedirectPage:
            return
        # OK to run
        # pywikibot.output(text) # debug only
        # First task is to remove references as they often contain dates of
        # publishing. And HTML comments as well, because why not?
        text = references.sub('', text)
        text = HTMLcomments.sub('', text)
        texttuple = self.parse3(text)
        # pywikibot.output(texttuple[1]) # debug only
        if not mode: # An ordinary article
            introtext = self.template_processor(page, texttuple[0])
            # pywikibot.output(introtext) # debug only
            self.birth_death(page, introtext)
            self.other_dates(page, introtext + texttuple[1])
        else: # An article of a year
            text = texttuple[0] + texttuple[1]
            self.year_process(page, text)

    def create_page(self):
        # global hiba - That's my own stuff.
        # For localization modify the 4 section titles below.
        # The template which appears in the anniversaries section of main page:
        templatepage = 'Sablon:Évfordulók' + self.daytitle
        sections = dict()
        # Section title for births (currently not implemented)
        sections['births'] = 'Születések'
        # Section title for deaths (currently not implemented)
        sections['deaths'] = 'Halálozások'
        # Section title for infoboxes
        sections['infobox'] = 'Infoboxok'
        # Section title for the remaining
        sections['other'] = 'Egyéb'
        # Section title for the articles about years
        sections['years'] = 'Évszámos cikkek'
        # At the beginning of the page we link the article of the date itself.
        # It is not subject to analyzation as it is human readable.
        # This is the "main article" template and a warning to sources as they
        # are often missing from articles of dates. Localize this, too.
        outtext = header
        outtext += '{{fő|' + self.date_composer(self.month, self.day) + '}} '
        outtext += '(A dátum saját szócikkét a bot nem dolgozta fel!)\n'
        # Now we link the template where the anniversaries may be written:
        outtext += f'* A kezdőlapra kerülő sablon: [[{templatepage}]]\n'
        outtext += 'A Wikidatában esetenként több dátum is lehet, itt a '
        outtext += 'legvalószínűbb érték jelenik meg.\n'

        # This loop will determine the order of output:
        for sect in sections: # Ordered: Python 3.6+
            def sortkey(item):
                """Make birth and death sections more readable."""
                if sect in ['births', 'deaths']:
                    return (0, 1)[item['page'].title()[0].isdigit()] * 10000 \
                            + item['year']
                return item['year']
            if len(self.data[sect]):
                outtext += '== '+ sections[sect] + ' ==\n'
                for item in sorted(self.data[sect], key=sortkey):
                    outtext += f"* '''{item['page'].title(as_link=True)}''':"
                    outtext += f" {item['text']}\n"
        outtext += self.categories() # Anything you want to write at the bottom
        pywikibot.output(outtext)
        # Write here your bot's summary:
        editsummary = 'Az évfordulók frissítése bottal'

        # And finally, we are ready to save the result!
        page = pywikibot.Page(self.site, self.targetpage)
        try:
            page.put(outtext, editsummary)
        except:
            pywikibot.output(self.daytitle + ' not saved.')
            # hiba = True - That's my own stuff.

    def run(self):
        pywikibot.output(self.date_composer(self.month, self.day))
        # Do we have to process anything at all? Depends on overwrite.
        if not self.overwrite:
            page = pywikibot.Page(self.site, self.targetpage)
            if page.exists():
                pywikibot.output(
                    '<<lightyellow>>' + page.title(as_link=True) + \
                    ' already exists, will be skipped.<<default>>' + \
                    '\nUse -noskip to force the bot to process it.')
                return

        # First run: non-excepted articles
        for page in self.list(self.month, self.day):
            pywikibot.output('<<lightred>>* [[' + page.title() +
                ']]<<default>>')
            self.process(page)

        # Second run: articles about years
        for page in self.year_list():
            pywikibot.output('<<lightred>>* [[' + page.title() +
                ']]<<default>>')
            self.process(page, mode=1)
            # The next line is neccessary unless you want to run the bot all
            # the day. Revise it if you have enabled b. C. years in your regex.
            if page.title()>='995': break

        # And finally:
        self.create_page()

########################################################
####                                                ####
####                  Wrappers                      ####
####    A few sample callers. You may add yours.    ####
####                                                ####
########################################################

def one_month(month, yearmodulo5=None, overwrite=False):
    """Process one month."""
    if month not in range(0, 13):
        return
    for i in range(1, days[month-1] + 1):
        bot = CallBot(month, i, yearmodulo5, overwrite)
        bot.run()

def next_month(withmodulo5=False, overwrite=False):
    """
    Process the next month.

    Parameter withmodulo5:
      if True, it looks anniversaries for the current year modulo 5 (see doc)
      if False, it takes every year
    """
    nextM = today().month + 1
    nextY = today().year
    if nextM > 12:
        nextM = 1
        nextY += 1
    pywikibot.output(f'year={nextY}, month={nextM}')
    if withmodulo5:
        one_month(nextM, nextY % 5, overwrite)
    else:
        one_month(nextM, overwrite=overwrite)

def main(*args):
    mode = None
    overwrite = False
    for arg in pywikibot.handle_args(*args):
        if arg in ['nextmonth', 'nextmonth5']:
            mode = arg
        elif arg == '-noskip':
            overwrite = True
    if mode == 'nextmonth':
        next_month(overwrite=overwrite)
    elif mode == 'nextmonth5':
        next_month(True, overwrite)
    else:
        # Wired-in behaviour
        # CallBot takes month, day and a year ending between 0 and 4 (year%5).
        # Sample:
        bot = CallBot(2, 2, 4, overwrite)
        bot.run()

if __name__ == "__main__":
    try:
        # hiba = False
        # fatal = True
        main()
        # fatal = False
    finally:
        # levelez(fatal,hiba)
        pywikibot.stopme()