Structure ========= MirrorManager is divided into four parts: :: +--------------------------+ +--------------+ | | | | | MirrorManager frontend | | MirrorList | | | | | +-----------+--------------+ +------+-------+ | ^ | | | |pickle | +-----------+ | | | | | | | Backend +--------+ | | | | +-----+-----+ | ^ | | | v | +-----+------+ | | | +---------------->| Database | | | +------------+ The database ------------ The database stores the information about the sites and their mirrors. It also stores the information about the files present on the master mirror and the status of the mirrors (if they are up to date or not). .. include:: database_schema.md :parser: myst_parser.sphinx_ The Frontend ------------ The frontend is a flask application which is one of the user interface. Via the frontend, user can register their own public or private mirror. They can specify where the mirror is and what it is mirroring (everything or just a part of it). In the frontend the admins can mark a mirror as inactive which will remove the mirror from the list of mirrors available. Finally, the list of up to date and available mirrors is presented by this application for each product, version and architecture. MirrorList ---------- MirrorList is a very light wsgi application which is hit by the users when they update their system. It is what returns them the list of mirrors they can query to retrieve their updates. Which makes this component very sensible and important. This application loads the list of available mirror from a pickle file generated by the backend. It then runs its heuristic to return a list of mirrors to the user. The list of mirror returned is computed based on - GeoIP information - Netblocks trying to return point the user to the mirror: - in the same netblock - in the same country - in the same continent Results are randomized so that it is not always the same mirrors that are hit when a group of computer being a NAT queries for mirrors. There are two endpoints served by MirrorList: - /mirrorlist - /metalink The first one returns the list of mirrors as text, one per line with some comments at the top marked by having a '#' at the start of the line. The second returns an xml file containing the md5, sha1, sha256, sha512 and the size and timestamp of the latest repomd.xml as well as a list of mirror where this repomd.xml is. The backend ----------- The backend is constituted of multiples scripts called manually or via cron tasks. These scripts rely on the data present on the database and either create or update it. * **mm2_update-master-directory-list** (UMDL) In short, this script calculates and stores what content the mirrors *should* have, by inspecting the master mirror. It browses the local copy of the mirror content and updates the database for each file and folder found in it. This information is crucial as it is what will be used by the crawler to determine if a mirror is up to date or not. In addition, if mirrormanager presents some invalid version number of architecture this script will likely be the culprit. * **mm2_crawler** In short, this script calculates and stores what the mirrors *do* have (and compares that to what they should have). The crawler goes through all the mirrors listed in the database (that are a/ active and b/ public) and crawls through their content to determine using the information in the database (filled in by the UMDL script) if the mirror is up to date or not (and mark it as such). * **update-EC2-netblocks** This script downloads information from amazon EC2 to keep an up to date list of which IPs are on amazon's EC2 thus allowing them to use the mirror present there. * **get_global_netblocks** This script gets global IPv4 and IPv6 netblocks from `routeviews.org `_ allowing mirrorlist to look up both a server and a client's network in the internet routing tables to determine "closest" mirror network-wise rather than geography-wise. * **get_internet2_netblocks** This script has the same logic as ``get_global_netblocks`` but for internet2. * **move-devel-to-release** This script points the development tree of a released product to its release tree. At release day, more mirrors will be shipping the devel branch than the release one. So by pointing to the devel tree, we will spread the traffic over more mirrors. A few weeks after, most mirrors will have catched up and the devel repositories should thus be moved to the release tree, for which you can use this script. * **move-to-archive** This script moves a tree to archive. Few days after a release reaches End Of Life, one would want to point the tree of this release to the archives where it will remain, un-touched.