Structure¶
MirrorManager is divided into four parts:
+--------------------------+ +--------------+
| | | |
| MirrorManager frontend | | MirrorList |
| | | |
+-----------+--------------+ +------+-------+
| ^
| |
| |pickle
| +-----------+ |
| | | |
| | Backend +--------+
| | |
| +-----+-----+
| ^
| |
| v
| +-----+------+
| | |
+---------------->| Database |
| |
+------------+
The database¶
The database stores the information about the sites and their mirrors. It also stores the information about the files present on the master mirror and the status of the mirrors (if they are up to date or not).
This is the database schema (scroll to zoom, drag to move):
erDiagram
access_stat {
INTEGER category_id PK,FK "indexed"
DATE date PK "indexed"
VARCHAR(255) name PK
FLOAT percent "nullable"
INTEGER requests "nullable,indexed"
}
access_stat_category {
INTEGER id PK
VARCHAR(255) name UK
}
arch {
INTEGER id PK
TEXT name UK
BOOLEAN primary_arch
BOOLEAN publiclist
}
category {
INTEGER id PK
INTEGER product_id FK "nullable"
INTEGER topdir_id FK "nullable"
BOOLEAN admin_only "nullable"
TEXT canonicalhost "nullable"
TEXT geo_dns_domain "nullable"
TEXT name UK
BOOLEAN publiclist
}
category_directory {
INTEGER category_id PK,FK
INTEGER directory_id PK,FK
}
country {
INTEGER id PK
TEXT code UK
}
country_continent_redirect {
INTEGER id PK
TEXT continent
TEXT country UK
}
directory {
INTEGER id PK
BIGINT ctime "nullable"
BLOB files "nullable"
TEXT name UK
BOOLEAN readable "indexed"
}
directory_exclusive_host {
INTEGER id PK
INTEGER directory_id FK
INTEGER host_id FK
}
embargoed_country {
INTEGER id PK
TEXT country_code UK
}
file_detail {
INTEGER id PK
INTEGER directory_id FK
TEXT filename
TEXT md5 "nullable"
TEXT sha1 "nullable"
TEXT sha256 "nullable"
TEXT sha512 "nullable"
BIGINT size "nullable"
BIGINT timestamp "nullable,indexed"
}
file_detail_file_group {
INTEGER id PK
INTEGER file_detail_id FK
INTEGER file_group_id FK
}
file_group {
INTEGER id PK
TEXT name UK
}
host {
INTEGER id PK
INTEGER site_id FK "nullable"
BOOLEAN admin_active "indexed"
INTEGER asn "nullable"
BOOLEAN asn_clients
INTEGER bandwidth_int "nullable"
TEXT comment "nullable"
BLOB config "nullable"
TEXT country "indexed"
INTEGER crawl_failures
TEXT disable_reason "nullable"
BOOLEAN internet2
BOOLEAN internet2_clients
DATETIME last_checked_in "nullable"
BIGINT last_crawl_duration "nullable,indexed"
DATETIME last_crawled "nullable"
BLOB last_crawls "nullable"
FLOAT latitude "nullable"
FLOAT longitude "nullable"
INTEGER max_connections
TEXT name "indexed"
BOOLEAN private "indexed"
TEXT push_ssh_command "nullable"
TEXT push_ssh_host "nullable"
TEXT push_ssh_private_key "nullable"
TEXT robot_email "nullable"
BOOLEAN user_active "indexed"
}
host_acl_ip {
INTEGER id PK
INTEGER host_id FK "nullable"
TEXT ip UK "nullable"
}
host_category {
INTEGER id PK
INTEGER category_id FK "nullable"
INTEGER host_id FK "nullable"
BOOLEAN always_up2date
}
host_category_dir {
INTEGER id PK
INTEGER directory_id FK "nullable"
INTEGER host_category_id FK
TEXT path "nullable"
BOOLEAN up2date "indexed"
}
host_category_url {
INTEGER id PK
INTEGER host_category_id FK
BOOLEAN private "indexed"
TEXT url UK
}
host_country {
INTEGER id PK
INTEGER country_id FK
INTEGER host_id FK
}
host_country_allowed {
INTEGER id PK
INTEGER host_id FK "nullable"
TEXT country UK
}
host_location {
INTEGER id PK
INTEGER host_id FK
INTEGER location_id FK
}
host_netblock {
INTEGER id PK
INTEGER host_id FK "nullable"
TEXT name "nullable"
TEXT netblock "nullable"
}
host_peer_asn {
INTEGER id PK
INTEGER host_id FK "nullable"
INTEGER asn
TEXT name "nullable"
}
host_stats {
INTEGER id PK
INTEGER host_id FK "nullable"
BLOB data "nullable"
DATETIME timestamp
TEXT type "nullable"
}
location {
INTEGER id PK
TEXT name UK
}
mm_group {
INTEGER id PK
DATETIME created
VARCHAR(255) display_name "nullable"
VARCHAR(16) group_name UK
}
mm_user {
INTEGER id PK
DATETIME created
VARCHAR(255) display_name "nullable"
VARCHAR(255) email_address UK
TEXT password "nullable"
VARCHAR(50) token "nullable"
DATETIME updated_on
VARCHAR(16) user_name UK
}
mm_user_group {
INTEGER group_id PK,FK
INTEGER user_id PK,FK
}
mm_user_visit {
INTEGER id PK
INTEGER user_id FK
DATETIME created
DATETIME expiry "nullable"
VARCHAR(50) user_ip
VARCHAR(40) visit_key UK "indexed"
}
netblock_country {
INTEGER id PK
TEXT country
TEXT netblock UK
}
product {
INTEGER id PK
TEXT name UK
BOOLEAN publiclist
}
propagation_stat {
DATETIME datetime PK "indexed"
INTEGER repository_id PK,FK "nullable,indexed"
INTEGER no_info
INTEGER older
INTEGER one_day
INTEGER same_day
INTEGER two_day
}
repository {
INTEGER id PK
INTEGER arch_id FK "nullable"
INTEGER category_id FK "nullable"
INTEGER directory_id FK "nullable"
INTEGER version_id FK "nullable"
BOOLEAN disabled
TEXT name UK
TEXT prefix "nullable,indexed"
}
repository_redirect {
INTEGER id PK
TEXT from_repo UK
TEXT to_repo "nullable"
}
site {
INTEGER id PK
BOOLEAN admin_active "indexed"
BOOLEAN all_sites_can_pull_from_me
DATETIME created_at "indexed"
TEXT created_by
TEXT downstream_comments "nullable"
BOOLEAN email_on_add
BOOLEAN email_on_drop
TEXT name "indexed"
TEXT org_url "nullable"
TEXT password "nullable"
BOOLEAN private "indexed"
BOOLEAN user_active "indexed"
}
site_admin {
INTEGER id PK
INTEGER site_id FK
TEXT username "nullable"
}
site_to_site {
INTEGER id PK
INTEGER downstream_site_id FK
INTEGER upstream_site_id FK
TEXT password "nullable"
TEXT username "nullable"
}
version {
INTEGER id PK
INTEGER product_id FK "nullable"
TEXT codename "nullable"
BOOLEAN display
TEXT display_name "nullable"
BOOLEAN is_test
TEXT name "nullable"
BOOLEAN ordered_mirrorlist
INTEGER sortorder
}
access_stat_category ||--o| access_stat : category_id
product ||--o{ category : product_id
directory ||--o{ category : topdir_id
category ||--o| category_directory : category_id
directory ||--o| category_directory : directory_id
directory ||--o{ directory_exclusive_host : directory_id
host ||--o{ directory_exclusive_host : host_id
directory ||--o{ file_detail : directory_id
file_detail ||--o{ file_detail_file_group : file_detail_id
file_group ||--o{ file_detail_file_group : file_group_id
site ||--o{ host : site_id
host ||--o{ host_acl_ip : host_id
host ||--o{ host_category : host_id
category ||--o{ host_category : category_id
host_category ||--o{ host_category_dir : host_category_id
directory ||--o{ host_category_dir : directory_id
host_category ||--o{ host_category_url : host_category_id
country ||--o{ host_country : country_id
host ||--o{ host_country : host_id
host ||--o{ host_country_allowed : host_id
location ||--o{ host_location : location_id
host ||--o{ host_location : host_id
host ||--o{ host_netblock : host_id
host ||--o{ host_peer_asn : host_id
host ||--o{ host_stats : host_id
mm_user ||--o| mm_user_group : user_id
mm_group ||--o| mm_user_group : group_id
mm_user ||--o{ mm_user_visit : user_id
repository ||--o| propagation_stat : repository_id
category ||--o{ repository : category_id
version ||--o{ repository : version_id
arch ||--o{ repository : arch_id
directory ||--o{ repository : directory_id
site ||--o{ site_admin : site_id
site ||--o{ site_to_site : upstream_site_id
site ||--o{ site_to_site : downstream_site_id
product ||--o{ version : product_id
The Frontend¶
The frontend is a flask application which is one of the user interface. Via the frontend, user can register their own public or private mirror. They can specify where the mirror is and what it is mirroring (everything or just a part of it).
In the frontend the admins can mark a mirror as inactive which will remove the mirror from the list of mirrors available.
Finally, the list of up to date and available mirrors is presented by this application for each product, version and architecture.
MirrorList¶
MirrorList is a very light wsgi application which is hit by the users when they update their system. It is what returns them the list of mirrors they can query to retrieve their updates. Which makes this component very sensible and important.
This application loads the list of available mirror from a pickle file generated by the backend. It then runs its heuristic to return a list of mirrors to the user.
The list of mirror returned is computed based on - GeoIP information - Netblocks
trying to return point the user to the mirror: - in the same netblock - in the same country - in the same continent
Results are randomized so that it is not always the same mirrors that are hit when a group of computer being a NAT queries for mirrors.
There are two endpoints served by MirrorList: - /mirrorlist - /metalink
The first one returns the list of mirrors as text, one per line with some comments at the top marked by having a ‘#’ at the start of the line. The second returns an xml file containing the md5, sha1, sha256, sha512 and the size and timestamp of the latest repomd.xml as well as a list of mirror where this repomd.xml is.
The backend¶
The backend is constituted of multiples scripts called manually or via cron tasks. These scripts rely on the data present on the database and either create or update it.
mm2_update-master-directory-list (UMDL) In short, this script calculates and stores what content the mirrors should have, by inspecting the master mirror.
It browses the local copy of the mirror content and updates the database for each file and folder found in it. This information is crucial as it is what will be used by the crawler to determine if a mirror is up to date or not. In addition, if mirrormanager presents some invalid version number of architecture this script will likely be the culprit.
mm2_crawler In short, this script calculates and stores what the mirrors do have (and compares that to what they should have).
The crawler goes through all the mirrors listed in the database (that are a/ active and b/ public) and crawls through their content to determine using the information in the database (filled in by the UMDL script) if the mirror is up to date or not (and mark it as such).
update-EC2-netblocks This script downloads information from amazon EC2 to keep an up to date list of which IPs are on amazon’s EC2 thus allowing them to use the mirror present there.
get_global_netblocks This script gets global IPv4 and IPv6 netblocks from routeviews.org allowing mirrorlist to look up both a server and a client’s network in the internet routing tables to determine “closest” mirror network-wise rather than geography-wise.
get_internet2_netblocks This script has the same logic as
get_global_netblocksbut for internet2.move-devel-to-release This script points the development tree of a released product to its release tree. At release day, more mirrors will be shipping the devel branch than the release one. So by pointing to the devel tree, we will spread the traffic over more mirrors. A few weeks after, most mirrors will have catched up and the devel repositories should thus be moved to the release tree, for which you can use this script.
move-to-archive This script moves a tree to archive. Few days after a release reaches End Of Life, one would want to point the tree of this release to the archives where it will remain, un-touched.