Retrieve and download all URLS of articles on a blog under Blogger


  • Français
  • English

  • Retrieve and download all URLS of articles on a blog under Blogger


    The problem

    It is difficult if not impossible to retrieve or download the URLs of blog articles in Blogger. For example, I have several blogs under Blogger and it’s a nightmare to make simple inbound links. Because the Blogger interface does not detect existing articles when you want to make internal links. The ideal would be to have all the URLs of your articles under Blogger and make the links at home.

    The Python script

    • 1 – Simply run the Python script below:

    # Import the necessary modules
    import requests
    import xml.etree.ElementTree as ET

    # Ask the user to provide a Blogger blog URL that contains an XML file
    url = input("Enter a Blogger blog URL that contains an XML file: ")

    # Send HTTP GET request to URL and get XML content
    response = requests.get(url)
    xml = response.text

    # Create an ElementTree object to parse the XML
    tree = ET.fromstring(xml)

    # Find all <link> elements that have the rel="alternate" attribute
    links = tree.findall(".//{http://www.w3.org/2005/Atom}link[@rel='alternate']")

    # Create an empty list to store article URLs
    articles = []

    # Browse all <link> elements found
    for link in links:
    # Extract the href attribute which contains the article url
    article_url = link.attrib["href"]
    # Add URL to article list
    articles.append(article_url)

    # Show number of items found
    print(f"Number of items found : {len(articles)}")

    # Show article URLs
    for article in articles:
    print(article)

    • 2 – When this script launches, it will display a field titled:

    Enter a Blogger platform URL that contains an XML file:

    A fielf of a script Python for downloading all the url from the articles of a Blogger blog

    • 3 – Provide your blog URL in the following form: https://yourblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

    Provide the url of your Blogger blog in a specific format to be able to retrieve all the URL of your posts.

    • 4 – It is mandatory that you provide it in this form. This is a type of sitemap that contains the majority of your Blogger blog articles. For example, for one of my test blogs which is: https://creditsbancaires.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500, here are all the URLs that it retrieved.

    All the URL of a Blogger blog download by a Python script

    Simple and straightforward.

    I don’t know anything about Python, how do I run this script?

    You don’t need to know Python to use this script. You copy the code and save it in a .py file, for example, url-blogger.py. Then you need to run it from the command line, but you need to have Python installed first…

    I can already see that you are grimacing at all these steps.

    No worries, I’ll make it easier for you.

    • Go to Thonny.org and download the version for your operating system: Windows, Mac or Linux. It is a Python IDE for beginners, very lightweight and very easy to install.
    • Once it is installed you need to download a package which is needed for this script which is request, but Thonny can do it in just a few clicks.
    • Go to Tools/Manage Packages and at the top left, type “requests” in the search bar.
    • The package will appear and you can install it in one click. If Thonny cannot install a package due to an error or other, you can also download the manual package by searching for it by name on Google and Thonny can also install it via the package you downloaded. But the requests package is available in Thonny.

    Install package in the Thonny Python IDE

    Searching a python package in the IDE Thonny

    Package found and ready to be installed in Thonny

    Next, Create a new file and copy-paste the script into it. Press F5 and provide your blog URL under Blogger as shown above and voilà!

    Houssen Moshinaly

    To contact the editor personally:

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Copy code