Probably the most desired ability of manipulating http headers is to change or "spoof" your user agent for legitimate or nefarious purposes
. By default python's urllib2 uses Python-urllib/2.6 as it's user agent. In order to changes the user agent and the other request headers, we'll have to make a few changes to the basic Python url request. This time I wrote a function so we can re-use it later.
There are different ways you can change the headers. For instance urllib2 request objects have a method .add_header that will take in a key, value pair (e.g. 'User-Agent', 'Mozilla/5.0...'). However, we don't want to have to call the method for each header we want to change.
import urllib2
def get_url(url):
'''get_url accepts a URL string and return the server response code, response headers, and contents of the file'''
req_headers = {
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13',
'Referer': 'http://python.org'}
request = urllib2.Request(url, headers=req_headers) # create a request object for the URL
opener = urllib2.build_opener() # create an opener object
response = opener.open(request) # open a connection and receive the http response headers + contents
code = response.code
headers = response.headers # headers object
contents = response.read() # contents of the URL (HTML, javascript, css, img, etc.)
return code, headers, contents
So instead, we created a dictionary call req_headers and when we built the Request object (line 9), we passed the dict to the Request method.
This is a simple example of a Python function that allows you to set various http request headers such as the User-Agent and Referer. In our next post, we'll show you how to handle redirects and error pages (e.g. 301, 302, 404, 503)


Comments
Very useful
November 6, 2009 - 6:57pm — pythonic guestThank you! It's so infuriating when websites choose to treat different visitors differently!
I love this code now I can use google search in my python script
October 27, 2011 - 3:39pm — Josh (not verified)I don't know why Google would ban pythons default user agent. Its not like they don't have enough server computers to serve us our links. Thanks for the Hack. I promise to use it for good
Thanks for this.
January 21, 2012 - 10:12am — pythonic guest (not verified)Thanks for this.
Post new comment