How To Use a Proxy Server With Python requests Module - Ultimate Guide

Ștefan Răcila on Apr 20 2023

Introduction

Web scraping is a powerful tool that allows you to extract valuable information from websites. However, it can also put a strain on the servers of the websites you are scraping, which is why many websites block IP addresses that make too many requests. To avoid this, you can use proxies to make requests. In this article, I'll show you how to use Python with proxies and how to rotate proxy IPs to avoid getting blocked.

Setting Up

Before we begin, you'll need to have the following prerequisites:

✅ Python installed

✅ Some experience with Python

✅ Python requests library also installed

✅ A list of proxy IPs and ports

To install Python, you'll need to download the Python installer from the official Python website: https://www.python.org/downloads/

You can choose the latest version of Python 3. It is recommended to use the latest version of Python for the latest features and security updates.

Once the download is complete, run the installer and follow the prompts to install Python on your computer. During the installation process, make sure to check the option to add Python to your system's PATH, which will allow you to run Python from the command line.

After the installation is complete, you can verify that Python is installed correctly by opening a command prompt or terminal and running the command `python --version`. This should display the version of Python that you have installed.

You can check if the `python-requests` package is installed by opening the terminal and running the following command:

$ pip freeze

`pip` is a package manager that should come bundled with newer versions of Python. If you need to install `pip` separately for any reason, you can follow the instructions in this guide.

`pip freeze` will display all your currently installed Python packages and their versions. Check if the `requests` module is present in that list. If not, install it by running the following command:

$ pip install requests

Configuration of the proxies

I'll explain in this section how to set up proxies with `python-requests`. To get started, we need a working proxy and the URL we want to send the request to.

Basic usage

import requests

proxies = {

    'http': 'http://proxy_ip:proxy_port', 

    'https': 'http://secure_proxy_ip:proxy_port',

}

res = requests.get('https://httpbin.org/get', proxies=proxies)

If you do not have any private proxy that you can use to test this code, you can find a free public proxy from the list on freeproxylists.net. Please note that the proxies on that site are not for use in any production environment and may not be reliable.

The `proxies` dictionary must have the exact structure as shown in the code sample. You provide a proxy to use for HTTP connections and a proxy to use for HTTPS connections. The proxies might be different or not. You can use the same proxy for multiple protocols.

Also notice that I used HTTP schema in the proxy URL for both connections. Not all proxies have a SSL certificate. The connection to the proxy will be made using HTTP in both cases.

To authenticate to a proxy you can use this syntax:

http://user:pass@working-proxy:port

Environment variables

If you don’t plan on using multiple sets of proxies, you can export them as environment variables.

This is how to export environment variables on Linux shell

$ export HTTP_PROXY='http://proxy_ip:proxy_port'

$ export HTTP_PROXYS='http://secure_proxy_ip:proxy_port'

To see check the environment just run

$ env

This is how to export environment variables on PowerShell

>_ $Env:HTTP_PROXY='http://proxy_ip:proxy_port'

>_ $Env:HTTP_PROXYS='http://secure_proxy_ip:proxy_port'

To see check the environment just run

>_ Get-ChildItem -Path Env:

This is how to export environment variables on Command Prompt

\> set HTTP_PROXY='http://proxy_ip:proxy_port'

\> set HTTP_PROXYS='http://secure_proxy_ip:proxy_port'

To see check the environment just run

\> set

This way you don’t need to define any proxies in your code. Just make the request and it will work.

How to read the response?

You can read your data in many ways, but in most cases you will want to read it as plain text or as a JSON encoded string.

Plain Text:

response = requests.get(url)

text_resp = response.text

JSON, for JSON-formatted responses the requests package provides a built-in method.

response = requests.get(url)

json_resp = response.json()

Proxy Sessions

You might also want to scrape data from websites that use sessions. In this case, you need to create a session object. First create a variable named `session` and assign it to the requests `Session()` method. Now you have to assign the proxies to the session’s `.proxies` attribute. Then you send your request using the `session` object already created. This time you only have to pass the URL as an argument.

import requests

session = requests.Session()

session.proxies = {

    'http': 'http://proxy_ip:proxy_port',

    'https': 'http://secure_proxy_ip:proxy_port',

}

res = session.get('https://httpbin.org/get')

Make sure to replace `proxy_ip` and `proxy_port` with the actual IP and port of your proxy.

How to Rotate Proxy IPs

To avoid getting blocked by websites, it's important to rotate your proxy IPs. One way to do this is to create a list of proxy IPs and ports and randomly select proxies while making requests.

Here is an example:

def proxy_request(url, **kwargs):

    while True:

        try:

            proxy = random.randint(0, len(ip_addresses) - 1)

            proxies = {

                'http': ip_addresses(proxy),

                'https': ip_addresses(proxy)

            }

           

            response = requests.get(url, proxies=proxies, timeout=5, **kwargs)

            print(f"Currently using proxy: { proxy['http'] }")

            break

        except:

            print("Error encoutered, changing the proxy...")

    return response

print(proxy_request('https://httpbin.org/get'))

Hire a professional

While handling your own proxies can be done using Python, it's a time-consuming process and it may take a lot of time and money to get a good set of proxies. To save time and money, you can use a professional scraping tool. WebScrapingAPI has built-in proxy management and rotation capabilities. We have a pool of verified and high-quality proxies, which are more reliable and can save you time and money in the long run.

We also have a proxy mode that you can try for free. To get a free API key you just have to make an account and begin the WebScrapingAPI trial. This is a code sample on how to use our proxy mode:

import requests

def get_params(object):

    params = ''

    for key,value in object.items():

        if list(object).index(key) < len(object) - 1:

            params += f"{key}={value}."

        else:

            params += f"{key}={value}"

    return params

API_KEY = '<YOUR_API_KEY>'

TARGET_URL = 'http://httpbin.org/get'

PARAMETERS = {

    "proxy_type":"datacenter",

    "device":"desktop"

}

PROXY = {

    "http": f"http://webscrapingapi.{ get_params(PARAMETERS) }:{ API_KEY }@proxy.webscrapingapi.com:80",

    "https": f"https://webscrapingapi.{ get_params(PARAMETERS) }:{ API_KEY }@proxy.webscrapingapi.com:8000"

}

response = requests.get(

    url=TARGET_URL,

    proxies=PROXY,

    verify=False

)

print(response.text)

Please note that if you want to connect to proxy mode via https, your code must be configured not to verify SSL certificates. In this case, it would be `verify=False` since you are working with Python Requests.

Takeaway

Using proxies is an effective way to avoid getting blocked while web scraping. By rotating proxy IPs and using a pool of proxies, you can reduce the chances of getting blocked and increase the chances of success. However, managing your own proxies can be a hassle and it may take a lot of time and money to get a good set of proxies.

When you subscribe to a premium proxy service, such as WebScrapingAPI, you will gain access to a variety of features, such as IP rotation and the ability to switch between datacenter and residential proxies.

We hope that this article has given you a better understanding of how to use a proxy with HttpClient and how it can help you with your scraping needs. Sign up for our 14-day free trial to test our service and learn about all of its features and functionalities.

News and updates

Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.

We care about the protection of your data. Read our Privacy Policy.

Guides How To Scrape Amazon Product Data: A Comprehensive Guide to Best Practices & Tools

Explore the complexities of scraping Amazon product data with our in-depth guide. From best practices and tools like Amazon Scraper API to legal considerations, learn how to navigate challenges, bypass CAPTCHAs, and efficiently extract valuable insights.

Suciu Dan

Aug 10 202315 min read

Use Cases Unleashing the Power of Financial Data: Exploring Traditional and Alternative Data

Dive into the transformative role of financial data in business decision-making. Understand traditional financial data and the emerging significance of alternative data.

Suciu Dan

Jul 26 20238 min read