refactor: upgrade BeautifulSoup4 with type hints (#999)

* refactor: upgrade BeautifulSoup4 with type hints

Upgrade dependency library BeautifulSoup4 to 4.13.3 (with type hints).
Refactor backends using BeautifulSoup4 to comply with type hints.
Apply style simplifications and improvements for consistency.
Remove variables and functions that are never used.
Remove code duplication between backends for parsing HTML tables.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* build: allow beautifulsoup4 version 4.12.3

Allow older version of beautifulsoup4 and ensure compatibility.
Update library dependencies.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
This commit is contained in:
Cesar Berrospi Ramis
2025-02-18 11:30:47 +01:00
committed by GitHub
parent 75db61127c
commit 7450050ace
8 changed files with 328 additions and 425 deletions

View File

@@ -410,68 +410,65 @@ item-0 at level 0: unspecified: group _root_
item-396 at level 3: list: group list
item-397 at level 4: list_item: list of books (useful looking abstracts)
item-398 at level 4: list_item: Ducks on postage stamps Archived 2013-05-13 at the Wayback Machine
item-399 at level 4: list_item:
item-400 at level 4: list_item: Ducks at a Distance, by Rob Hine ... uide to identification of US waterfowl
item-401 at level 3: table with [3x2]
item-402 at level 3: picture
item-403 at level 3: list: group list
item-404 at level 4: list_item: Ducks
item-405 at level 4: list_item: Game birds
item-406 at level 4: list_item: Bird common names
item-407 at level 3: list: group list
item-408 at level 4: list_item: All accuracy disputes
item-409 at level 4: list_item: Accuracy disputes from February 2020
item-410 at level 4: list_item: CS1 Finnish-language sources (fi)
item-411 at level 4: list_item: CS1 Latvian-language sources (lv)
item-412 at level 4: list_item: CS1 Swedish-language sources (sv)
item-413 at level 4: list_item: Articles with short description
item-414 at level 4: list_item: Short description is different from Wikidata
item-415 at level 4: list_item: Wikipedia indefinitely move-protected pages
item-416 at level 4: list_item: Wikipedia indefinitely semi-protected pages
item-417 at level 4: list_item: Articles with 'species' microformats
item-418 at level 4: list_item: Articles containing Old English (ca. 450-1100)-language text
item-419 at level 4: list_item: Articles containing Dutch-language text
item-420 at level 4: list_item: Articles containing German-language text
item-421 at level 4: list_item: Articles containing Norwegian-language text
item-422 at level 4: list_item: Articles containing Lithuanian-language text
item-423 at level 4: list_item: Articles containing Ancient Greek (to 1453)-language text
item-424 at level 4: list_item: All articles with self-published sources
item-425 at level 4: list_item: Articles with self-published sources from February 2020
item-426 at level 4: list_item: All articles with unsourced statements
item-427 at level 4: list_item: Articles with unsourced statements from January 2022
item-428 at level 4: list_item: CS1: long volume value
item-429 at level 4: list_item: Pages using Sister project links with wikidata mismatch
item-430 at level 4: list_item: Pages using Sister project links with hidden wikidata
item-431 at level 4: list_item: Webarchive template wayback links
item-432 at level 4: list_item: Articles with Project Gutenberg links
item-433 at level 4: list_item: Articles containing video clips
item-434 at level 3: list: group list
item-435 at level 4: list_item: This page was last edited on 21 September 2024, at 12:11 (UTC).
item-436 at level 4: list_item: Text is available under the Crea ... tion, Inc., a non-profit organization.
item-437 at level 3: list: group list
item-438 at level 4: list_item: Privacy policy
item-439 at level 4: list_item: About Wikipedia
item-440 at level 4: list_item: Disclaimers
item-441 at level 4: list_item: Contact Wikipedia
item-442 at level 4: list_item: Code of Conduct
item-443 at level 4: list_item: Developers
item-444 at level 4: list_item: Statistics
item-445 at level 4: list_item: Cookie statement
item-446 at level 4: list_item: Mobile view
item-399 at level 4: list_item: Ducks at a Distance, by Rob Hine ... uide to identification of US waterfowl
item-400 at level 3: table with [3x2]
item-401 at level 3: picture
item-402 at level 3: list: group list
item-403 at level 4: list_item: Ducks
item-404 at level 4: list_item: Game birds
item-405 at level 4: list_item: Bird common names
item-406 at level 3: list: group list
item-407 at level 4: list_item: All accuracy disputes
item-408 at level 4: list_item: Accuracy disputes from February 2020
item-409 at level 4: list_item: CS1 Finnish-language sources (fi)
item-410 at level 4: list_item: CS1 Latvian-language sources (lv)
item-411 at level 4: list_item: CS1 Swedish-language sources (sv)
item-412 at level 4: list_item: Articles with short description
item-413 at level 4: list_item: Short description is different from Wikidata
item-414 at level 4: list_item: Wikipedia indefinitely move-protected pages
item-415 at level 4: list_item: Wikipedia indefinitely semi-protected pages
item-416 at level 4: list_item: Articles with 'species' microformats
item-417 at level 4: list_item: Articles containing Old English (ca. 450-1100)-language text
item-418 at level 4: list_item: Articles containing Dutch-language text
item-419 at level 4: list_item: Articles containing German-language text
item-420 at level 4: list_item: Articles containing Norwegian-language text
item-421 at level 4: list_item: Articles containing Lithuanian-language text
item-422 at level 4: list_item: Articles containing Ancient Greek (to 1453)-language text
item-423 at level 4: list_item: All articles with self-published sources
item-424 at level 4: list_item: Articles with self-published sources from February 2020
item-425 at level 4: list_item: All articles with unsourced statements
item-426 at level 4: list_item: Articles with unsourced statements from January 2022
item-427 at level 4: list_item: CS1: long volume value
item-428 at level 4: list_item: Pages using Sister project links with wikidata mismatch
item-429 at level 4: list_item: Pages using Sister project links with hidden wikidata
item-430 at level 4: list_item: Webarchive template wayback links
item-431 at level 4: list_item: Articles with Project Gutenberg links
item-432 at level 4: list_item: Articles containing video clips
item-433 at level 3: list: group list
item-434 at level 4: list_item: This page was last edited on 21 September 2024, at 12:11 (UTC).
item-435 at level 4: list_item: Text is available under the Crea ... tion, Inc., a non-profit organization.
item-436 at level 3: list: group list
item-437 at level 4: list_item: Privacy policy
item-438 at level 4: list_item: About Wikipedia
item-439 at level 4: list_item: Disclaimers
item-440 at level 4: list_item: Contact Wikipedia
item-441 at level 4: list_item: Code of Conduct
item-442 at level 4: list_item: Developers
item-443 at level 4: list_item: Statistics
item-444 at level 4: list_item: Cookie statement
item-445 at level 4: list_item: Mobile view
item-446 at level 3: list: group list
item-447 at level 3: list: group list
item-448 at level 4: list_item:
item-449 at level 4: list_item:
item-450 at level 3: list: group list
item-451 at level 1: caption: Pacific black duck displaying the characteristic upending "duck"
item-452 at level 1: caption: Male mallard.
item-453 at level 1: caption: Wood ducks.
item-454 at level 1: caption: Mallard landing in approach
item-455 at level 1: caption: Male Mandarin duck
item-456 at level 1: caption: Flying steamer ducks in Ushuaia, Argentina
item-457 at level 1: caption: Female mallard in Cornwall, England
item-458 at level 1: caption: Pecten along the bill
item-459 at level 1: caption: Mallard duckling preening
item-460 at level 1: caption: A Muscovy duckling
item-461 at level 1: caption: Ringed teal
item-462 at level 1: caption: Indian Runner ducks, a common breed of domestic ducks
item-463 at level 1: caption: Three black-colored ducks in the coat of arms of Maaninka[49]
item-448 at level 1: caption: Pacific black duck displaying the characteristic upending "duck"
item-449 at level 1: caption: Male mallard.
item-450 at level 1: caption: Wood ducks.
item-451 at level 1: caption: Mallard landing in approach
item-452 at level 1: caption: Male Mandarin duck
item-453 at level 1: caption: Flying steamer ducks in Ushuaia, Argentina
item-454 at level 1: caption: Female mallard in Cornwall, England
item-455 at level 1: caption: Pecten along the bill
item-456 at level 1: caption: Mallard duckling preening
item-457 at level 1: caption: A Muscovy duckling
item-458 at level 1: caption: Ringed teal
item-459 at level 1: caption: Indian Runner ducks, a common breed of domestic ducks
item-460 at level 1: caption: Three black-colored ducks in the coat of arms of Maaninka[49]

View File

@@ -1413,9 +1413,6 @@
},
{
"$ref": "#/texts/350"
},
{
"$ref": "#/texts/351"
}
],
"content_layer": "body",
@@ -1428,14 +1425,14 @@
"$ref": "#/texts/341"
},
"children": [
{
"$ref": "#/texts/351"
},
{
"$ref": "#/texts/352"
},
{
"$ref": "#/texts/353"
},
{
"$ref": "#/texts/354"
}
],
"content_layer": "body",
@@ -1448,6 +1445,9 @@
"$ref": "#/texts/341"
},
"children": [
{
"$ref": "#/texts/354"
},
{
"$ref": "#/texts/355"
},
@@ -1522,9 +1522,6 @@
},
{
"$ref": "#/texts/379"
},
{
"$ref": "#/texts/380"
}
],
"content_layer": "body",
@@ -1538,10 +1535,10 @@
},
"children": [
{
"$ref": "#/texts/381"
"$ref": "#/texts/380"
},
{
"$ref": "#/texts/382"
"$ref": "#/texts/381"
}
],
"content_layer": "body",
@@ -1554,6 +1551,9 @@
"$ref": "#/texts/341"
},
"children": [
{
"$ref": "#/texts/382"
},
{
"$ref": "#/texts/383"
},
@@ -1577,9 +1577,6 @@
},
{
"$ref": "#/texts/390"
},
{
"$ref": "#/texts/391"
}
],
"content_layer": "body",
@@ -1591,14 +1588,7 @@
"parent": {
"$ref": "#/texts/341"
},
"children": [
{
"$ref": "#/texts/392"
},
{
"$ref": "#/texts/393"
}
],
"children": [],
"content_layer": "body",
"name": "list",
"label": "list"
@@ -6774,27 +6764,13 @@
"content_layer": "body",
"label": "list_item",
"prov": [],
"orig": "",
"text": "",
"enumerated": false,
"marker": "-"
},
{
"self_ref": "#/texts/351",
"parent": {
"$ref": "#/groups/42"
},
"children": [],
"content_layer": "body",
"label": "list_item",
"prov": [],
"orig": "Ducks at a Distance, by Rob Hines at Project Gutenberg - A modern illustrated guide to identification of US waterfowl",
"text": "Ducks at a Distance, by Rob Hines at Project Gutenberg - A modern illustrated guide to identification of US waterfowl",
"enumerated": false,
"marker": "-"
},
{
"self_ref": "#/texts/352",
"self_ref": "#/texts/351",
"parent": {
"$ref": "#/groups/43"
},
@@ -6808,7 +6784,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/353",
"self_ref": "#/texts/352",
"parent": {
"$ref": "#/groups/43"
},
@@ -6822,7 +6798,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/354",
"self_ref": "#/texts/353",
"parent": {
"$ref": "#/groups/43"
},
@@ -6836,7 +6812,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/355",
"self_ref": "#/texts/354",
"parent": {
"$ref": "#/groups/44"
},
@@ -6850,7 +6826,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/356",
"self_ref": "#/texts/355",
"parent": {
"$ref": "#/groups/44"
},
@@ -6864,7 +6840,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/357",
"self_ref": "#/texts/356",
"parent": {
"$ref": "#/groups/44"
},
@@ -6878,7 +6854,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/358",
"self_ref": "#/texts/357",
"parent": {
"$ref": "#/groups/44"
},
@@ -6892,7 +6868,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/359",
"self_ref": "#/texts/358",
"parent": {
"$ref": "#/groups/44"
},
@@ -6906,7 +6882,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/360",
"self_ref": "#/texts/359",
"parent": {
"$ref": "#/groups/44"
},
@@ -6920,7 +6896,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/361",
"self_ref": "#/texts/360",
"parent": {
"$ref": "#/groups/44"
},
@@ -6934,7 +6910,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/362",
"self_ref": "#/texts/361",
"parent": {
"$ref": "#/groups/44"
},
@@ -6948,7 +6924,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/363",
"self_ref": "#/texts/362",
"parent": {
"$ref": "#/groups/44"
},
@@ -6962,7 +6938,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/364",
"self_ref": "#/texts/363",
"parent": {
"$ref": "#/groups/44"
},
@@ -6976,7 +6952,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/365",
"self_ref": "#/texts/364",
"parent": {
"$ref": "#/groups/44"
},
@@ -6990,7 +6966,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/366",
"self_ref": "#/texts/365",
"parent": {
"$ref": "#/groups/44"
},
@@ -7004,7 +6980,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/367",
"self_ref": "#/texts/366",
"parent": {
"$ref": "#/groups/44"
},
@@ -7018,7 +6994,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/368",
"self_ref": "#/texts/367",
"parent": {
"$ref": "#/groups/44"
},
@@ -7032,7 +7008,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/369",
"self_ref": "#/texts/368",
"parent": {
"$ref": "#/groups/44"
},
@@ -7046,7 +7022,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/370",
"self_ref": "#/texts/369",
"parent": {
"$ref": "#/groups/44"
},
@@ -7060,7 +7036,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/371",
"self_ref": "#/texts/370",
"parent": {
"$ref": "#/groups/44"
},
@@ -7074,7 +7050,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/372",
"self_ref": "#/texts/371",
"parent": {
"$ref": "#/groups/44"
},
@@ -7088,7 +7064,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/373",
"self_ref": "#/texts/372",
"parent": {
"$ref": "#/groups/44"
},
@@ -7102,7 +7078,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/374",
"self_ref": "#/texts/373",
"parent": {
"$ref": "#/groups/44"
},
@@ -7116,7 +7092,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/375",
"self_ref": "#/texts/374",
"parent": {
"$ref": "#/groups/44"
},
@@ -7130,7 +7106,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/376",
"self_ref": "#/texts/375",
"parent": {
"$ref": "#/groups/44"
},
@@ -7144,7 +7120,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/377",
"self_ref": "#/texts/376",
"parent": {
"$ref": "#/groups/44"
},
@@ -7158,7 +7134,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/378",
"self_ref": "#/texts/377",
"parent": {
"$ref": "#/groups/44"
},
@@ -7172,7 +7148,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/379",
"self_ref": "#/texts/378",
"parent": {
"$ref": "#/groups/44"
},
@@ -7186,7 +7162,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/380",
"self_ref": "#/texts/379",
"parent": {
"$ref": "#/groups/44"
},
@@ -7200,7 +7176,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/381",
"self_ref": "#/texts/380",
"parent": {
"$ref": "#/groups/45"
},
@@ -7214,7 +7190,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/382",
"self_ref": "#/texts/381",
"parent": {
"$ref": "#/groups/45"
},
@@ -7228,7 +7204,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/383",
"self_ref": "#/texts/382",
"parent": {
"$ref": "#/groups/46"
},
@@ -7242,7 +7218,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/384",
"self_ref": "#/texts/383",
"parent": {
"$ref": "#/groups/46"
},
@@ -7256,7 +7232,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/385",
"self_ref": "#/texts/384",
"parent": {
"$ref": "#/groups/46"
},
@@ -7270,7 +7246,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/386",
"self_ref": "#/texts/385",
"parent": {
"$ref": "#/groups/46"
},
@@ -7284,7 +7260,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/387",
"self_ref": "#/texts/386",
"parent": {
"$ref": "#/groups/46"
},
@@ -7298,7 +7274,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/388",
"self_ref": "#/texts/387",
"parent": {
"$ref": "#/groups/46"
},
@@ -7312,7 +7288,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/389",
"self_ref": "#/texts/388",
"parent": {
"$ref": "#/groups/46"
},
@@ -7326,7 +7302,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/390",
"self_ref": "#/texts/389",
"parent": {
"$ref": "#/groups/46"
},
@@ -7340,7 +7316,7 @@
"marker": "-"
},
{
"self_ref": "#/texts/391",
"self_ref": "#/texts/390",
"parent": {
"$ref": "#/groups/46"
},
@@ -7352,34 +7328,6 @@
"text": "Mobile view",
"enumerated": false,
"marker": "-"
},
{
"self_ref": "#/texts/392",
"parent": {
"$ref": "#/groups/47"
},
"children": [],
"content_layer": "body",
"label": "list_item",
"prov": [],
"orig": "",
"text": "",
"enumerated": false,
"marker": "-"
},
{
"self_ref": "#/texts/393",
"parent": {
"$ref": "#/groups/47"
},
"children": [],
"content_layer": "body",
"label": "list_item",
"prov": [],
"orig": "",
"text": "",
"enumerated": false,
"marker": "-"
}
],
"pictures": [

View File

@@ -473,7 +473,6 @@ The 1992 Disney film The Mighty Ducks, starring Emilio Estevez, chose the duck a
- list of books (useful looking abstracts)
- Ducks on postage stamps Archived 2013-05-13 at the Wayback Machine
-
- Ducks at a Distance, by Rob Hines at Project Gutenberg - A modern illustrated guide to identification of US waterfowl
| Authority control databases | Authority control databases |
@@ -526,7 +525,4 @@ additional terms may apply. By using this site, you agree to the Terms of Use an
- Developers
- Statistics
- Cookie statement
- Mobile view
-
-
- Mobile view