Reverse Engineering Soundcloud API

Hi guys, in this post we will learn how to bypass downloading restrictions on Soundcloud. We will create a Python script which will allow us to download even those songs which are not enabled for downloading. We will work on this project in a step by step basis where we will tackle each problem as we encounter it. I will try to make it as universal in nature as possible so that you can follow this project even if Soundcloud has changed its website layout or the way it serves media files. So without any further ado let’s get started:

Note: I don’t endorse illicit downloading of someone else’s content. This is merely an educational guide and should be used to download your own content only.

1. Reverse Engineering the MP3 URL Generation Logic

Let’s start by opening up Chrome. Soundcloud doesn’t provide us with the .mp3 url on the media page so we need to figure out how and from where Soundcloud gets the .mp3 url. Open up Soundcloud and open this publically accessible music file which we will be using for testing purposes.

Now we need to open the Chrome developer tools. The network tab in the chrome developer tools will allow us to see all of the requests which the browser makes when we open Soundcloud. After opening up the developer tools and navigating to the network tab you should end up with something similar to this:

Now refresh the page with the developer tools open. You should start seeing the requests pane getting populated by tons of different links. Don’t feel intimidated, we will make sense of all of this in just a bit. You can see that there are already 100+ requests being made by Soundcloud. We need to find a way to filter the requests so that they become manageable for us to sift through.

While looking at the requests in general I saw that Soundcloud is making multiple requests to an api.soundcloud.com endpoint. If you ever see any requests being made to an api endpoint always explore these first. Here is what you will end up with after filtering those requests which have api in their url:

Now after filtering the requests I saw that there was a stream url. That caught my attention because most of the time stream urls do exactly what they stand for. They stream the media content. So I clicked on the stream link and saw what response we were getting from Soundcloud on that endpoint:

Screen Shot 2018-01-29 at 7.30.44 PM

And lo-and-behold. That endpoint returns a couple of media links. Now the one we are interested in is the http_mp3_128_url because they are usually the most straightforward to download.

There seems to be a problem. Whenever we try opening the http_mp3_128_url url in a new tab we are greeted with the 403 Forbidden error. There is definitely something fishy going on because if I scroll down in the developer tools I can see that Soundcloud is successfully accessing that url without any Forbidden error. Now most of the times what happens is that the server checks the headers and cookies of the browser to verify that an authorized person is accessing the endpoint. However, I am not logged in so there might be something else going on.

After refreshing the page a couple of times I observed that http_mp3_128_url url changed after every refresh. That must mean that the urls are for one time use only and are programmatically generated on every access. And after the browser plays the media file for the first time, the urls expire and that is the reason we were getting a Forbidden error. To verify my observation I opened the stream url in a new tab and then tried accessing the http_mp3_128_url url myself, before the Soundcloud player.

All of a sudden we are able to access the media file without the Forbidden error!

Now we need to deconstruct the stream url as well so that we can generate it ourselves. The stream url in my case is this:

https://api.soundcloud.com/i1/tracks/391350885/streams?client_id=6pDzV3ImgWPohE7UmVQOCCepAaKOgrVL

Everything seems pretty generic. The cliend_id is definitely the SoundCloud API key because I am not logged in. The interesting part of the url is 391350885 which is not a part of the original media url. Where did this number come from?

I filtered the network requests with this number and couldn’t find it’s source. The very next thing which I did was to search the HTML source of the page and bam! The track number was embedded in that!

Now that we know how Soundcloud generates the .mp3 url, it is the perfect time to write a script to automate this. The script should take in a Soundcloud url and should return an mp3 url. So let’s get started.

2. Creating a Python Script for Automating the URL generation

Start up by creating an app.py file in your directory. This will hold all of the required code.

$ touch app.py

Now import the required libraries. We will be using requests for making the HTTP requests, sys for taking command-line inputs and re for extracting the text from the HTML page. A lot of people object to the usage of re for extracting text from HTML but in this case where we know that we are only extracting a small piece of text from the page it is fine.

import requests
import re
import sys

Lets write down the preliminary code for taking in a Soundcloud URL from the command line and opening up the Soundcloud page using requests.

import sys
import requests
import re

url = sys.argv[-1]
html = requests.get(url)

We are not using argparse because we will soon be converting this script into an online API. Now we need to find a way to extract the track id from the page. Here is a simple regex which works:

track_id = re.search(r'soundcloud://sounds:(.+?)"', html.text)

Now we need to open up the api url and get the actual mp3 stream link. To do that add the following code to your python file:

final_page = requests.get("https://api.soundcloud.com/i1/tracks/{0}/streams?client_id=6pDzV3ImgWPohE7UmVQOCCepAaKOgrVL".format(track_id.group(1)))
print(final_page.json()['http_mp3_128_url'])

And there you go. You have the complete script which will give you the mp3 link from a Soundcloud media url. Here is the complete code:

import sys
import requests
import re
import json

url = sys.argv[-1]
html = requests.get(url)

track_id = re.search(r'soundcloud://sounds:(.+?)"', html.text)
final_page = requests.get("https://api.soundcloud.com/i1/tracks/{0}/streams?client_id=6pDzV3ImgWPohE7UmVQOCCepAaKOgrVL".format(track_id.group(1)))
print(final_page.json()['http_mp3_128_url'])

Go on, save this in a file and run it. But the problem is that this isn’t terribly useful. How about we turn this into a web app which anyone can use? Now that would be a lot more useful.

3. Turning this into a web app

We will be using Flask to convert this into a web app. The Flask website provides us with some very basic code which we can use as our starting point.

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

Save the above code in a app.py file. Run the following command in the terminal:

$ Flask_APP=app.py flask run

This will tell the flask command line program about where to find our flask code which it needs to serve. If everything is working fine, you should see the following output:

 * Running on http://localhost:5000/

Now we need to implement a custom URL endpoint which will take the Soundcloud media URL as the input and will redirect user to the MP3 file URL. Let’s name our custom endpoint /generate_link and make it accept query parameters.

@app.route("/generate_link")
def generate_link():
    media_url = request.args.get('url','')
    return media_url

Our custom end-point doesn’t really do anything. It simply echoes back whatever you pass it through the url query parameter. The reason for not implementing the rest of the functionality is that we haven’t actually converted our previous script into a module. Let’s do that real quick first:

import sys
import requests
import re
import json

def get_link(url):
    html = requests.get(url)
    track_id = re.search(r'soundcloud://sounds:(.+?)"', html.text)
    final_page = requests.get("https://api.soundcloud.com/i1/tracks/{0}/streams?client_id=6pDzV3ImgWPohE7UmVQOCCepAaKOgrVL".format(track_id.group(1)))
    return final_page.json()['http_mp3_128_url']

I am assuming that this module is saved into the same directory as your flask app. Here is my current directory structure:

$ ls
app.py soundcloudDownload.py

The soundcloudDownload.py contains the script (now converted to a module) and the app.py contains the flask app. Now let’s import soundcloudDownload.py into our app.py file and implement the functionality to make the web app a bit more useful:

from soundcloudDownload import get_link
from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

@app.route("/generate_link")
def generate_link():
    media_url = request.args.get('url','')
    return get_link(media_url)

Now restart your flask app in your terminal and try accessing the following url:

http://localhost:5000/generate_link?url=https://soundcloud.com/m-yasoob-khalid/shutdown

If everything works fine you should get something similar to this in the response in your browser:

https://cf-media.sndcdn.com/og4Ho8QAsLWj.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGl
hLnNuZGNkbi5jb20vb2c0SG84UUFzTFdqLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOj
E1MTczNTA3NDV9fX1dfQ__&Signature=XQAyN~Atl8OGeqwmxKa7Zx7S50YX229mdIq-XiU753cGKEmWac8FGK~GdSylj0Uo2sqBnJxzDA
fC3Ahv1MbY~LPGQ8A-q36-vwF6Z5v88-BvflDMmYuXnj0gqWvolR1GMq6SsgMPRGCfNu4D8cS0NckRCif8dGCEQxQVQ2laSCC4e4lpkuqtS
gJOJ6L26N8zrma~2lCJc7TxqCp3~aROuejC-4JVm7P6f4vtB38-l7vT-nWjrsHNC33YLI~Kex6ciOeRGGmFU-eyUDSpooIzrfj6wiR-1A66
MLWFkuUoKboSRfy9Zz6zFSqgPTXZKePHKoKuMzDjEAV42j5Gbm8dgQ__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ

Still the user needs to copy the url and open it in a new tab. Let’s improve the situation by implementing automatic redirection to the MP3 page:

from soundcloudDownload import get_link
from flask import Flask, re

app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

@app.route("/generate_link")
def generate_link():
    media_url = request.args.get('url','')
    return redirect(get_link(media_url), code=302)

Now when you try opening the same generate_link url in your browser you should be redirected to an mp3 file. Great! Everything is working perfectly fine and as promised you have reverse engineered the soundcloud web app and figured out a way to download mp3 files.

4. Further Steps

Now we can go ahead and implement a usable web interface to this web api but I will leave that as an exercise for the reader. Search online on how you can use Jinja templates with flask and then make a front-end for this. You can also create a browser extension which injects a download button to all of the soundcloud media pages. That way the user won’t even have to copy the url. They can simply click the download button and the download will start. The end-goal is to remove as many steps as possible and streamline the process. A simple rule of thumb is that the less the number of steps required to achieve a task, the more usable a service/app is.

I might turn this into a web app with search functionality and an MP3 player. In order to stay tuned please follow my blog.

 

Advertisements

6 thoughts on “Reverse Engineering Soundcloud API”

  1. I don’t think this is really ethical, the thing you’re doing. SoundCloud suffers as it is, it’s a very useful service which doesn’t have a particularly good business model, permanently on the verge of shutting down. When you teach people to reuse SC’s backend, it only serves to further damage the company (by driving the costs up), bringing closer the impending doom.

    I’d rather drain some other company’s resources, there are plenty of malvertising businesses out there with very deep pockets, such as Facebook, Google.

    Just my $0.05.

    1. I totally agree. This was more of a guide on how to reverse-engineer the mp3 link generation in general. That is why I walked through all the steps and didn’t give out the code in the very beginning. Either Way, I will keep your suggestion in mind while writing the next post.

    2. You’re right Rei-Chan, and I do like SoundCloud and hope for the best for its life continuity, but I think the Emphasize here is not about teaching people to “steal from SoundCloud”. Yasoob is just trying to explain how he achieve what he did, step by step in a practical tutorial.

      Which he did it very impressive and very helpful for someone like me who are new to Python.
      This tutorial have very good practical aspects of at least the following things:
      – Reverse engineering a service (or I’d like to call it understanding how things work)
      – Design a flow to automate the process
      – Using Python to automate the flow
      – Practical example of library like Requests and Flask can be used
      – How to structure it properly

      I’d like to thank Yasoob for his works! keep it up! It’s very helpful for me personally. I like how you make a tutorial that is practical and have a real world application. It does really help me build similar scraping tools for other service. I’m new to Python but have been doing some web dev in Node and PHP. Starting to like the beauty of Python

      Thanks again.

  2. “You can also create a browser extension which injects a download button to all of the soundcloud media pages.”

    How would you do that? With Python or JS? That’s interesting, I used to create apps, but never browser extensions. Could you provide any links so I can read more about it please? Thanks for astonishing guide.

    1. I am sorry I don’t really have any link handy. There are two ways you can go about doing this. First one is to create a browser extension and converting this Python code into Javascript code. The second one is to convert the python code into a Flask based API, run the API in the backend, create a browser extension which can pass a Soundcloud URL to the backend API and download the music from the url returned by the Flask API. I hope it helps 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s