How to Bypass Soundcloud's Bot Protection
DISCLAIMER: This is for educational purposes! Do not repeat anything seen here!
NOTE: Acutal bypass code is not given, instead psuedocode is provided.
1. Motivations
I'd wanted to make a Discord rich presence for Soundcloud, since it is my primary
"streaming" service. I'd need to use the API for this, and to use it you need a
"Client ID" and "Client Secret". Both of which, you need an official Soundcloud app
to get. To get that, you need to email Soundcloud support to request API access.
Even after trying this, my attempt was, sadly, rejected, with (in my opinion), a bogus
reason:
...and yes, their "discord integration" is just a bot that allows you to play music. Not what I want.
2. Figuring out the (internal) API
I know that Soundcloud
has to use it's API to get information, like listen history
among other things. Looking at the network, I found the endpoint I wanted
/me/play-history/tracks.
The problem now is that, whenever I emulate the exact same API request, it would eventually
send HTTP 403. Looking at the headers, I saw an "anomaly":
X-Datadome-ClientID.
I also noticed it changes every time a request goes through, so this is likely the reason
why my requests were failing.
2.1: What's Datadome?
Datadome is a "bot protection" company. Essentially they limit bots and "AI agents" from
accessing a website. Going to their homepage confirms my suspicion that Soundcloud does
in fact use it. So if I want to use their API without getting my API access request granted,
I'll need to bypass this.
3. Bypassing Datadome
Luckily, I noticed there were a couple patterns
- Every single Client ID was 128 characters.
- There were 3 common seperators:
["~", "_", "~_"]
- It was primarily made up of letters, but ocassionally some numbers were sprinkled in.
- There are anywhere from six to nine seperators per ClientID
Based on these four patterns, I was able to come up with a bypass*, written in about five
minutes. Below is psuedocode:
function generate_clientid(){
seperator_count = Random(6,9)
seperators = ["~", "_", "~_"]
result = ""
every = int(128/seperator_count)
for i in 0 -> 129 {
if i mod every == 0 and i != 0 {
chosen = Random(0, len(seperators-1))
result += seperators[chosen]
i += seperators[chosen].length
}
result += alphabet[Random(0, alphabet.length - 1)]
i++
}
return result
}
Now all that was needed was to fire up a script that would HTTP GET
/me/play-history/tracks and
if it returned HTTP 403, regenerate the Client ID and redo the request. Weirdly, Soundcloud actually
does not have a rate limit! The only rate limit it has is on
Play Requests, which is not important to us
at all. To quote the Soundcloud API:
We currently do not enforce any limit on the total number of calls made by a client application in aggregate.
Shockingly, my script actually works with a sucess rate of 65.34% after 101 calls (66 success, 35 fail)
As long as I'm able to keep up with these requests, and actually maintain accuracy with my Discord RPC
code, then it's fine by me, and I'd count it as a success in my books!
4. Conclusion
This was a super fun project. Because I was able to get one API endpoint working without reqquiring
an approved API app, that technically means I'm able to use almost any API endpoint and pretty much
interface with Soundcloud as if I was just a normal browser.
I don't actually plan on doing this, but this "project" was mainly a fun excercise (I was aloso kinda
mad about how convoluted Soundcloud's API is). Thank you for reading! :D