Retrieve and download all URLS of articles on a blog under Blogger
Retrieve and download all URLS of articles on a blog under Blogger
The problem
It is difficult if not impossible to retrieve or download the URLs of blog articles in Blogger. For example, I have several blogs under Blogger and it’s a nightmare to make simple inbound links. Because the Blogger interface does not detect existing articles when you want to make internal links. The ideal would be to have all the URLs of your articles under Blogger and make the links at home.
The Python script
- 1 – Simply run the Python script below:
# Import the necessary modules
import requests
import xml.etree.ElementTree as ET
# Ask the user to provide a Blogger blog URL that contains an XML file
url = input("Enter a Blogger blog URL that contains an XML file: ")
# Send HTTP GET request to URL and get XML content
response = requests.get(url)
xml = response.text
# Create an ElementTree object to parse the XML
tree = ET.fromstring(xml)
# Find all <link> elements that have the rel="alternate" attribute
links = tree.findall(".//{http://www.w3.org/2005/Atom}link[@rel='alternate']")
# Create an empty list to store article URLs
articles = []
# Browse all <link> elements found
for link in links:
# Extract the href attribute which contains the article url
article_url = link.attrib["href"]
# Add URL to article list
articles.append(article_url)
# Show number of items found
print(f"Number of items found : {len(articles)}")
# Show article URLs
for article in articles:
print(article)
- 2 – When this script launches, it will display a field titled:
Enter a Blogger platform URL that contains an XML file:
- 3 – Provide your blog URL in the following form: https://yourblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
- 4 – It is mandatory that you provide it in this form. This is a type of sitemap that contains the majority of your Blogger blog articles. For example, for one of my test blogs which is: https://creditsbancaires.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500, here are all the URLs that it retrieved.
Simple and straightforward.
I don’t know anything about Python, how do I run this script?
You don’t need to know Python to use this script. You copy the code and save it in a .py file, for example, url-blogger.py. Then you need to run it from the command line, but you need to have Python installed first…
I can already see that you are grimacing at all these steps.
No worries, I’ll make it easier for you.
- Go to Thonny.org and download the version for your operating system: Windows, Mac or Linux. It is a Python IDE for beginners, very lightweight and very easy to install.
- Once it is installed you need to download a package which is needed for this script which is request, but Thonny can do it in just a few clicks.
- Go to Tools/Manage Packages and at the top left, type “requests” in the search bar.
- The package will appear and you can install it in one click. If Thonny cannot install a package due to an error or other, you can also download the manual package by searching for it by name on Google and Thonny can also install it via the package you downloaded. But the requests package is available in Thonny.
Next, Create a new file and copy-paste the script into it. Press F5 and provide your blog URL under Blogger as shown above and voilà!