Have you ever wanted to scrape comments from a TikTok video to analyze audience reactions or gather data for a project? In this tutorial, I’ll guide you step by step through the process of scraping TikTok comments using Python and the requests library. We’ll break down the code, explain its components, and ensure that you understand everything, even if you’re new to web scraping. Let’s dive in! π
Β
If you’d like, you can watch the video tutorial along with this blog.
Letβs kick things off by importing the necessary libraries and setting up our base URL. This URL will serve as the foundation for our API requests:
import requests
import json
requests library to handle HTTP requests and json to work with JSON data (TikTokβs response format). These are essential for fetching the data from the TikTok server.
post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'
post_id = post_url.split('/')[-1]
post_id is extracted from the URL using split('/')[-1], which fetches the last part of the URL (the video ID). This post_id will be used to query TikTok’s API.When making web requests, itβs important to mimic a real browser. This is done using HTTP headers. The headers dictionary contains several key-value pairs that make your request look like itβs coming from a regular browser (specifically Google Chrome on Windows in this case). This helps to prevent the server from blocking the request. π
headers = {
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
'cache-control': 'no-cache',
'pragma': 'no-cache',
'priority': 'u=1, i',
'referer': 'https://www.tiktok.com/explore',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}
comments = []
comments.append({'post_url': post_url})
curs = 0
comments: We initialize an empty list to store the comments we’ll scrape. The first item in the list is a dictionary containing the post URL for reference.curs: This variable keeps track of the pagination cursor, which will allow us to load more comments in subsequent requests.
def parser(data):
comment = data['comments']
for cm in comment:
com = cm['share_info']['desc']
if com == "":
com = cm['text']
comments.append(com)
print(com)
return data
The parser function extracts the comments from the JSON data returned by the TikTok API.
data['comments']: We access the list of comments in the response.desc (description) field is empty, we fall back to the text field.comments list and printed.
def req(post_id, curs):
url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&...&cursor={curs}&aweme_id={post_id}"
response = requests.request("GET", url, headers=headers)
info = response.text
raw_data = json.loads(info)
return raw_data
The req function sends a GET request to TikTok’s API to fetch the comments.
post_id (the video ID) and curs (the cursor for pagination).response.text contains the raw JSON response from the server.json.loads(info) converts the raw JSON string into a Python dictionary.
while 1:
data = req(post_id, curs)
same_data = parser(data)
if same_data['has_more'] == 1:
curs += 20
print('moving to the next cursor')
else:
break
We use a while loop to continuously scrape comments:
req function fetches the data for the current cursor position.parser function extracts the comments.has_more field in the response is 1, it means there are more comments to fetch, so we increment the cursor by 20 to move to the next set of comments.
with open('output.json', 'w', encoding='utf-8') as f:
json.dump(comments, f, ensure_ascii=False, indent=4)
Once all the comments are scraped, we save them to a JSON file (output.json). We ensure the comments are saved in a readable format with indentation using indent=4. The ensure_ascii=False ensures that non-ASCII characters are properly encoded (e.g., emojis or special characters).
In this tutorial, you learned how to scrape TikTok comments using Python and the requests library. Here’s a quick recap:
requests for HTTP requests and json for handling TikTok’s JSON response.
import requests,json
post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'
post_id = post_url.split('/')[-1]
headers = {
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
'cache-control': 'no-cache',
'pragma': 'no-cache',
'priority': 'u=1, i',
'referer': 'https://www.tiktok.com/explore',
'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}
comments = []
comments.append({'post_url':post_url})
curs = 0
def parser(data):
comment = data['comments']
for cm in comment:
com = cm['share_info']['desc']
if com == "":
com = cm['text']
comments.append(com)
print(com)
return data
def req(post_id,curs):
url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&app_language=en&app_name=tiktok_web&aweme_id={post_id}&browser_language=en-US&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F129.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor={curs}&data_collection_enabled=true&device_id=7420045675172267552&device_platform=web_pc&focus_state=true&from_page=video&history_len=9&is_fullscreen=false&is_page_visible=true&odinId=7420045705102722080&os=windows&priority_region=&referer=®ion=GB&screen_height=864&screen_width=1536&tz_name=Asia%2FTehran&user_is_login=false&webcast_language=en&msToken=LACQZNSB6gTZGDBhIqrVKgoO19z0F5AUS9BgaOy6Y0PI5qokvf8CrE01SHTkGr296aSEazAVJNBRMo3E3MrB2Dz4dEIMfNjmfFdz1Gra7Kb3BZpNrZesv2bwtBjLEQqf572Gz3xrA-7ju9L7y-p1z0vmCRc=&X-Bogus=DFSzswVOGPUANtN1t69wzMSwXQM1&_signature=_02B4Z6wo00001xqLMzgAAIDDkLHDRCQFA1saizeAAKBlda"
response = requests.request("GET", url, headers=headers)
info = response.text
raw_data = json.loads(info)
return raw_data
while 1:
data = req(post_id,curs)
same_data = parser(data)
if same_data['has_more'] == 1:
curs+=20
print('moving to the next curser')
else:
break
with open('output.json', 'w', encoding='utf-8') as f:
json.dump(comments, f, ensure_ascii=False, indent=4)
print("\ndata has been saved...., good job amir")
One Response
Arkadi Nikolaevich Did you have to make it easy Go go