Have you ever wanted to scrape comments from a TikTok video to analyze audience reactions or gather data for a project? In this tutorial, I’ll guide you step by step through the process of scraping TikTok comments using Python and the requests library. We’ll break down the code, explain its components, and ensure that you understand everything, even if you’re new to web scraping. Let’s dive in! π
Β
If you’d like, you can watch the video tutorial along with this blog.
Letβs kick things off by importing the necessary libraries and setting up our base URL. This URL will serve as the foundation for our API requests:
import requests
import json
requests
library to handle HTTP requests and json
to work with JSON data (TikTokβs response format). These are essential for fetching the data from the TikTok server.
post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'
post_id = post_url.split('/')[-1]
post_id
is extracted from the URL using split('/')[-1]
, which fetches the last part of the URL (the video ID). This post_id
will be used to query TikTok’s API.When making web requests, itβs important to mimic a real browser. This is done using HTTP headers. The headers
dictionary contains several key-value pairs that make your request look like itβs coming from a regular browser (specifically Google Chrome on Windows in this case). This helps to prevent the server from blocking the request. π
headers = {
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
'cache-control': 'no-cache',
'pragma': 'no-cache',
'priority': 'u=1, i',
'referer': 'https://www.tiktok.com/explore',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}
comments = []
comments.append({'post_url': post_url})
curs = 0
comments
: We initialize an empty list to store the comments we’ll scrape. The first item in the list is a dictionary containing the post URL for reference.curs
: This variable keeps track of the pagination cursor, which will allow us to load more comments in subsequent requests.
def parser(data):
comment = data['comments']
for cm in comment:
com = cm['share_info']['desc']
if com == "":
com = cm['text']
comments.append(com)
print(com)
return data
The parser
function extracts the comments from the JSON data returned by the TikTok API.
data['comments']
: We access the list of comments in the response.desc
(description) field is empty, we fall back to the text
field.comments
list and printed.
def req(post_id, curs):
url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&...&cursor={curs}&aweme_id={post_id}"
response = requests.request("GET", url, headers=headers)
info = response.text
raw_data = json.loads(info)
return raw_data
The req
function sends a GET request to TikTok’s API to fetch the comments.
post_id
(the video ID) and curs
(the cursor for pagination).response.text
contains the raw JSON response from the server.json.loads(info)
converts the raw JSON string into a Python dictionary.
while 1:
data = req(post_id, curs)
same_data = parser(data)
if same_data['has_more'] == 1:
curs += 20
print('moving to the next cursor')
else:
break
We use a while
loop to continuously scrape comments:
req
function fetches the data for the current cursor position.parser
function extracts the comments.has_more
field in the response is 1
, it means there are more comments to fetch, so we increment the cursor by 20
to move to the next set of comments.
with open('output.json', 'w', encoding='utf-8') as f:
json.dump(comments, f, ensure_ascii=False, indent=4)
Once all the comments are scraped, we save them to a JSON file (output.json
). We ensure the comments are saved in a readable format with indentation using indent=4
. The ensure_ascii=False
ensures that non-ASCII characters are properly encoded (e.g., emojis or special characters).
In this tutorial, you learned how to scrape TikTok comments using Python and the requests
library. Here’s a quick recap:
requests
for HTTP requests and json
for handling TikTok’s JSON response.
import requests,json
post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'
post_id = post_url.split('/')[-1]
headers = {
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
'cache-control': 'no-cache',
'pragma': 'no-cache',
'priority': 'u=1, i',
'referer': 'https://www.tiktok.com/explore',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}
comments = []
comments.append({'post_url':post_url})
curs = 0
def parser(data):
comment = data['comments']
for cm in comment:
com = cm['share_info']['desc']
if com == "":
com = cm['text']
comments.append(com)
print(com)
return data
def req(post_id,curs):
url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&app_language=en&app_name=tiktok_web&aweme_id={post_id}&browser_language=en-US&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F129.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor={curs}&data_collection_enabled=true&device_id=7420045675172267552&device_platform=web_pc&focus_state=true&from_page=video&history_len=9&is_fullscreen=false&is_page_visible=true&odinId=7420045705102722080&os=windows&priority_region=&referer=®ion=GB&screen_height=864&screen_width=1536&tz_name=Asia%2FTehran&user_is_login=false&webcast_language=en&msToken=LACQZNSB6gTZGDBhIqrVKgoO19z0F5AUS9BgaOy6Y0PI5qokvf8CrE01SHTkGr296aSEazAVJNBRMo3E3MrB2Dz4dEIMfNjmfFdz1Gra7Kb3BZpNrZesv2bwtBjLEQqf572Gz3xrA-7ju9L7y-p1z0vmCRc=&X-Bogus=DFSzswVOGPUANtN1t69wzMSwXQM1&_signature=_02B4Z6wo00001xqLMzgAAIDDkLHDRCQFA1saizeAAKBlda"
response = requests.request("GET", url, headers=headers)
info = response.text
raw_data = json.loads(info)
return raw_data
while 1:
data = req(post_id,curs)
same_data = parser(data)
if same_data['has_more'] == 1:
curs+=20
print('moving to the next curser')
else:
break
with open('output.json', 'w', encoding='utf-8') as f:
json.dump(comments, f, ensure_ascii=False, indent=4)
print("\ndata has been saved...., good job amir")
One Response
Arkadi Nikolaevich Did you have to make it easy Go go