Probably the most desired ability of manipulating http headers is to change or "spoof" your user agent for legitimate or nefarious purposes
. By default python's urllib2 uses Python-urllib/2.6 as it's user agent. In order to changes the user agent and the other request headers, we'll have to make a few changes to the basic Python url request. This time I wrote a function so we can re-use it later.
There are different ways you can change the headers. For instance urllib2 request objects have a method .add_header that will take in a key, value pair (e.g. 'User-Agent', 'Mozilla/5.0...'). However, we don't want to have to call the method for each header we want to change.
import urllib2
def get_url(url):
'''get_url accepts a URL string and return the server response code, response headers, and contents of the file'''
req_headers = {
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13',
'Referer': 'http://python.org'}
request = urllib2.Request(url, headers=req_headers) # create a request object for the URL
opener = urllib2.build_opener() # create an opener object
response = opener.open(request) # open a connection and receive the http response headers + contents
code = response.code
headers = response.headers # headers object
contents = response.read() # contents of the URL (HTML, javascript, css, img, etc.)
return code, headers, contents
So instead, we created a dictionary call req_headers and when we built the Request object (line 9), we passed the dict to the Request method.
This is a simple example of a Python function that allows you to set various http request headers such as the User-Agent and Referer. In our next post, we'll show you how to handle redirects and error pages (e.g. 301, 302, 404, 503)


Comments
Very useful
November 6, 2009 - 6:57pm — pythonic guestThank you! It's so infuriating when websites choose to treat different visitors differently!
Post new comment