This article is on polling the Steam Web API using python3. Steam in this case, refers to the gaming platform developed by Valve. Code is not really explained, however, is annotated with comments and should be easily interpreted.

Why did I start crawling Steam Servers?

Following a COMP2521 assignment on implementing a basic version of a search engine, I started wondering whether it would be possible to assign a PageRank to every Steam profile, using either steam level or steam inventory value as page weights – after all, a higher level or higher inventory value means an account worth more money = more trustworthy right?

As PageRank was meant to be a way to connect nodes (webpages) using links (hyperlinks), for the purposes of implementing PageRank on Steam would be to use SteamID64s (every steam account has their own SteamID64 with corresponding profile page) with the links being friends.

Luckily for us, the Steam Web API has an endpoint for getting the friends list of an account, and another endpoint for the steam level of an account. But for the purposes of PageRank, I would need to crawl as many steam accounts as possible which is nowhere near feasible to do by hand – here’s where automation and python3 comes in.

Setting Up

We want to be able to create a queue – a way for our python script to know what SteamID64 to crawl next. For this, I chose to use PostgreSQL – a database that will allow me to timestamp each of my queries. You will need to construct a table with a timestamp field and a char field for the SteamID64.

In terms of python3 modules, we need a way to interact with our database (psycopg2 for PostgreSQL) and a way to poll the Steam Web API (requests).

Example Code

import psycopg2
import requests
import json
import os

from datetime import datetime

APIKEY=""
DB_USER = ""
DB_PASS = ""
DB_NAME = ""
path = "/"

# Steam level requirements are not linear
# i.e. leveling from 0 to 10 is easier than from 90 to 100
# xp is a better indication for PageRank rather than level
def calculate_xp(level):
    xp = 0
    max = int(int(level)/10)
    for i in range(max + 1):
        xp += i * 10
    xp = xp * 100
    for i in range(max * 10, int(level)):
        xp += (max + 1) * 100
    return str(xp)


def crawl_oldest():
    try:
        # get oldest steam id that hasn't been crawled
        cur.execute("SELECT steamid64, last_crawled, crawled_before FROM to_crawl ORDER BY last_crawled;");
        row = cur.fetchone()
        ID64 = int(row[0])
        print("Crawling STEAMID64(" + row[0] + ") with timestamp (" + row[1].strftime("%H:%M:%S.%f - %b %d %Y") + ")")
        # define our parameters for our API requests
        parameters = {"key": APIKEY, "steamid": ID64}

        friends = requests.get("https://api.steampowered.com/ISteamUser/GetFriendList/v1/", params=parameters)
        # Error checking
        # https://api.steampowered.com/ISteamUser/GetFriendList/v1/?key=DB2B9677C02FD6B7D5BF59CEAD3712F8&steamid=76561197996004766
        # Internal Server Problem
        # Weird as should mean steam account does not exist, yet someone had this as a friend?
        if friends.status_code != 200:
            cur.execute("UPDATE to_crawl SET last_crawled = current_timestamp, crawled_before = TRUE WHERE steamid64 = %s;",
                    (row[0],));
            conn.commit()
            return
        data_friends = friends.json()
        friends_file = open(path + row[0] + ".friends.json", 'w+')
        friends_file.write(json.dumps(data_friends))
        friends_file.close()

        level = requests.get("https://api.steampowered.com/IPlayerService/GetSteamLevel/v1/", params=parameters)
        data_level = level.json()
        level_file = open(path + row[0] + ".level.json", "w+")
        level_file.write(json.dumps(data_level))
        level_file.close()

        # handle if friends list set to private
        if "friendslist" in data_friends:
            # if friends haven't been crawled before, add to database
            for i in data_friends['friendslist']['friends']:
                cur.execute("SELECT steamid64, last_crawled FROM to_crawl WHERE steamid64=%s;", (i['steamid'],));
                if cur.rowcount == 0:
                    cur.execute("INSERT INTO to_crawl (steamid64, last_crawled, crawled_before) VALUES (%s, current_timestamp, FALSE);", (i['steamid'],))
                    conn.commit()
        # update last_crawled time
        cur.execute("UPDATE to_crawl SET last_crawled = current_timestamp, crawled_before = TRUE WHERE steamid64 = %s;", (row[0],));
        conn.commit()
        cinput_file.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)


print("Connecting to database...")
conn = psycopg2.connect(host="localhost", database=DB_NAME, user=DB_USER, password=DB_PASS)

cur = conn.cursor()
for i in range(0, 10000):
    crawl_oldest()

While this code works fine, it could be faster. Try crawling multiple webpages at once using os.fork()!