In this post, I\'ll present the quickest way to get up and running fast, doing analysis on real estate data using Python. We\'ll make a request to the OpenHouse API, retrieve some data, and then do a quick analysis. You won\'t need any prior experience beyond basic Python to follow this walkthrough.
%matplotlib inline
import matplotlib.pyplot as plt
import requests
import json
import pandas as pd
The url below makes a specific request to the OpenHouse API. If you look through it, you can see the parameters being used and perhaps edit them to suit your own needs. If you check out the OpenHouse gallery, you can use a user interface to customize this request.
url = "http://api.openhouseproject.co/api/property/?min_price=0&max_price=5000000&min_bedrooms=0&max_bedrooms=8&min_bathrooms=0&max_bathrooms=7&min_building_size=100&max_building_size=4000&close_to=(158.58473030020312,33.99802726234877,-118.223876953125)"
Let\'s retrieve the data and confirm that the server sent us a good response (i.e. status code is 200).
r = requests.get(url)
r.status_code == 200
The results come back from the API as a list of bytes in the UTF-8 format. Let\'s decode that and convert it to a JSON object. If you\'re not already familiar with JSON, you should stop and check out the previous link to learn about it. People with basic Python experience should find it very intuitive and straightforward.
response = json.loads(r.content.decode("utf-8"))
print(response.keys())
The response is a dictionary that has four keys.
The count value is the number of records returned
The previous and next values are useful for paginating through the data if there are more results available than what you retrieved. We won\'t be using those in this post.
Finally results is the most interesting part to us. It\'s a list of property details. Let\'s print out one as an example and see what it contains.
home = response['results'][0]
for key in home.keys():
if key not in ('raw_address', 'address_object', 'valid', 'submitter'):
print(key + ":")
print(home[key])
print("-------------")
I left out the address_object
because it\'s large and worth unpacking itself.
keys = home['address_object'].keys()
for key in keys:
if key != 'raw':
print(key + ":")
print(home['address_object'][key])
print("-------------")
Let\'s convert this JSON response into a Pandas data frame. If you\'re not already familiar with Pandas, stop everything and go get an introduction. Pandas is the swiss army knife for data in Python. A good tutorial can be found here.
Due to the nested nature of the response, Pandas doesn\'t give us exactly what we want out of the box, so let\'s take a few steps to fix that as well. I\'m going to provide a list of fields
that I\'d like to extract from the address_object, and pull them out to be new columns. After all that\'s done, we can delete the address_object
and see the results.
df = pd.DataFrame(response['results'])
fields = ['area_level_1', 'area_level_2', 'country', 'formatted_address', 'latitude', 'longitude', 'postal_code']
for field in fields:
df[field] = df['address_object'].apply(lambda x: x[field])
del df['address_object']
df.head()
If you\'ve got an analysis already in mind, you should be able to take it from there! In order to leave you with a slightly more complete boilerplate of code to build on, let\'s do one plot of the relationship between building size and price.
plt.figure(figsize=(8,8))
plt.scatter(df['building_size'], df['price'])
plt.xlabel('Building Size', fontsize=18)
plt.ylabel('Last Sale Price', fontsize=18)
plt.show()
And just in case we want to make sure we have the exact same dataset in the future, let\'s write it to a tab separated file.
df.to_csv('my_example_data.csv', sep='\t', index=False)