How To Use a Proxy Server With Python requests Module - Ultimate Guide
Ștefan Răcila on Apr 20 2023
Introduction
Web scraping is a powerful tool that allows you to extract valuable information from websites. However, it can also put a strain on the servers of the websites you are scraping, which is why many websites block IP addresses that make too many requests. To avoid this, you can use proxies to make requests. In this article, I'll show you how to use Python with proxies and how to rotate proxy IPs to avoid getting blocked.
Setting Up
Before we begin, you'll need to have the following prerequisites:
✅ Python installed
✅ Some experience with Python
✅ Python requests library also installed
✅ A list of proxy IPs and ports
To install Python, you'll need to download the Python installer from the official Python website: https://www.python.org/downloads/
You can choose the latest version of Python 3. It is recommended to use the latest version of Python for the latest features and security updates.
Once the download is complete, run the installer and follow the prompts to install Python on your computer. During the installation process, make sure to check the option to add Python to your system's PATH, which will allow you to run Python from the command line.
After the installation is complete, you can verify that Python is installed correctly by opening a command prompt or terminal and running the command `python --version`. This should display the version of Python that you have installed.
You can check if the `python-requests` package is installed by opening the terminal and running the following command:
$ pip freeze
`pip` is a package manager that should come bundled with newer versions of Python. If you need to install `pip` separately for any reason, you can follow the instructions in this guide.
`pip freeze` will display all your currently installed Python packages and their versions. Check if the `requests` module is present in that list. If not, install it by running the following command:
$ pip install requests
Configuration of the proxies
I'll explain in this section how to set up proxies with `python-requests`. To get started, we need a working proxy and the URL we want to send the request to.
Basic usage
import requests
proxies = {
'http': 'http://proxy_ip:proxy_port',
'https': 'http://secure_proxy_ip:proxy_port',
}
res = requests.get('https://httpbin.org/get', proxies=proxies)
If you do not have any private proxy that you can use to test this code, you can find a free public proxy from the list on freeproxylists.net. Please note that the proxies on that site are not for use in any production environment and may not be reliable.
The `proxies` dictionary must have the exact structure as shown in the code sample. You provide a proxy to use for HTTP connections and a proxy to use for HTTPS connections. The proxies might be different or not. You can use the same proxy for multiple protocols.
Also notice that I used HTTP schema in the proxy URL for both connections. Not all proxies have a SSL certificate. The connection to the proxy will be made using HTTP in both cases.
To authenticate to a proxy you can use this syntax:
http://user:pass@working-proxy:port
Environment variables
If you don’t plan on using multiple sets of proxies, you can export them as environment variables.
This is how to export environment variables on Linux shell
$ export HTTP_PROXY='http://proxy_ip:proxy_port'
$ export HTTP_PROXYS='http://secure_proxy_ip:proxy_port'
To see check the environment just run
$ env
This is how to export environment variables on PowerShell
>_ $Env:HTTP_PROXY='http://proxy_ip:proxy_port'
>_ $Env:HTTP_PROXYS='http://secure_proxy_ip:proxy_port'
To see check the environment just run
>_ Get-ChildItem -Path Env:
This is how to export environment variables on Command Prompt
\> set HTTP_PROXY='http://proxy_ip:proxy_port'
\> set HTTP_PROXYS='http://secure_proxy_ip:proxy_port'
To see check the environment just run
\> set
This way you don’t need to define any proxies in your code. Just make the request and it will work.
How to read the response?
You can read your data in many ways, but in most cases you will want to read it as plain text or as a JSON encoded string.
Plain Text:
response = requests.get(url)
text_resp = response.text
JSON, for JSON-formatted responses the requests package provides a built-in method.
response = requests.get(url)
json_resp = response.json()
Proxy Sessions
You might also want to scrape data from websites that use sessions. In this case, you need to create a session object. First create a variable named `session` and assign it to the requests `Session()` method. Now you have to assign the proxies to the session’s `.proxies` attribute. Then you send your request using the `session` object already created. This time you only have to pass the URL as an argument.
import requests
session = requests.Session()
session.proxies = {
'http': 'http://proxy_ip:proxy_port',
'https': 'http://secure_proxy_ip:proxy_port',
}
res = session.get('https://httpbin.org/get')
Make sure to replace `proxy_ip` and `proxy_port` with the actual IP and port of your proxy.
How to Rotate Proxy IPs
To avoid getting blocked by websites, it's important to rotate your proxy IPs. One way to do this is to create a list of proxy IPs and ports and randomly select proxies while making requests.
Here is an example:
def proxy_request(url, **kwargs):
while True:
try:
proxy = random.randint(0, len(ip_addresses) - 1)
proxies = {
'http': ip_addresses(proxy),
'https': ip_addresses(proxy)
}
response = requests.get(url, proxies=proxies, timeout=5, **kwargs)
print(f"Currently using proxy: { proxy['http'] }")
break
except:
print("Error encoutered, changing the proxy...")
return response
print(proxy_request('https://httpbin.org/get'))
Hire a professional
While handling your own proxies can be done using Python, it's a time-consuming process and it may take a lot of time and money to get a good set of proxies. To save time and money, you can use a professional scraping tool. WebScrapingAPI has built-in proxy management and rotation capabilities. We have a pool of verified and high-quality proxies, which are more reliable and can save you time and money in the long run.
We also have a proxy mode that you can try for free. To get a free API key you just have to make an account and begin the WebScrapingAPI trial. This is a code sample on how to use our proxy mode:
import requests
def get_params(object):
params = ''
for key,value in object.items():
if list(object).index(key) < len(object) - 1:
params += f"{key}={value}."
else:
params += f"{key}={value}"
return params
API_KEY = '<YOUR_API_KEY>'
TARGET_URL = 'http://httpbin.org/get'
PARAMETERS = {
"proxy_type":"datacenter",
"device":"desktop"
}
PROXY = {
"http": f"http://webscrapingapi.{ get_params(PARAMETERS) }:{ API_KEY }@proxy.webscrapingapi.com:80",
"https": f"https://webscrapingapi.{ get_params(PARAMETERS) }:{ API_KEY }@proxy.webscrapingapi.com:8000"
}
response = requests.get(
url=TARGET_URL,
proxies=PROXY,
verify=False
)
print(response.text)
Please note that if you want to connect to proxy mode via https, your code must be configured not to verify SSL certificates. In this case, it would be `verify=False` since you are working with Python Requests.
Takeaway
Using proxies is an effective way to avoid getting blocked while web scraping. By rotating proxy IPs and using a pool of proxies, you can reduce the chances of getting blocked and increase the chances of success. However, managing your own proxies can be a hassle and it may take a lot of time and money to get a good set of proxies.
When you subscribe to a premium proxy service, such as WebScrapingAPI, you will gain access to a variety of features, such as IP rotation and the ability to switch between datacenter and residential proxies.
We hope that this article has given you a better understanding of how to use a proxy with HttpClient and how it can help you with your scraping needs. Sign up for our 14-day free trial to test our service and learn about all of its features and functionalities.
News and updates
Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.
We care about the protection of your data. Read our Privacy Policy.
Related articles
Explore the complexities of scraping Amazon product data with our in-depth guide. From best practices and tools like Amazon Scraper API to legal considerations, learn how to navigate challenges, bypass CAPTCHAs, and efficiently extract valuable insights.
Dive into the transformative role of financial data in business decision-making. Understand traditional financial data and the emerging significance of alternative data.
Get started with WebScrapingAPI, the ultimate web scraping solution! Collect real-time data, bypass anti-bot systems, and enjoy professional support.