Search

How to Scrape TikTok Comments With Python Requests πŸπŸ’¬

Introduction πŸŽ“

Have you ever wanted to scrape comments from a TikTok video to analyze audience reactions or gather data for a project? In this tutorial, I’ll guide you step by step through the process of scraping TikTok comments using Python and the requests library. We’ll break down the code, explain its components, and ensure that you understand everything, even if you’re new to web scraping. Let’s dive in! πŸš€

Β 

If you’d like, you can watch the video tutorial along with this blog.

Importing Libraries πŸ“¦

Let’s kick things off by importing the necessary libraries and setting up our base URL. This URL will serve as the foundation for our API requests:

				
					import requests
import json

				
			
  • Here, we import the requests library to handle HTTP requests and json to work with JSON data (TikTok’s response format). These are essential for fetching the data from the TikTok server.

Defining the TikTok Post URL 🎯

				
					post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'
post_id = post_url.split('/')[-1]

				
			
  • We define the TikTok post URL from which we want to scrape the comments. The post_id is extracted from the URL using split('/')[-1], which fetches the last part of the URL (the video ID). This post_id will be used to query TikTok’s API.

Setting Up Headers πŸ›‘οΈ

When making web requests, it’s important to mimic a real browser. This is done using HTTP headers. The headers dictionary contains several key-value pairs that make your request look like it’s coming from a regular browser (specifically Google Chrome on Windows in this case). This helps to prevent the server from blocking the request. πŸ”’

				
					headers = {
    'accept': '*/*',
    'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
    'cache-control': 'no-cache',
    'pragma': 'no-cache',
    'priority': 'u=1, i',
    'referer': 'https://www.tiktok.com/explore',
    'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

				
			

Initializing Variables πŸš€

				
					comments = []
comments.append({'post_url': post_url})
curs = 0

				
			
  • comments: We initialize an empty list to store the comments we’ll scrape. The first item in the list is a dictionary containing the post URL for reference.
  • curs: This variable keeps track of the pagination cursor, which will allow us to load more comments in subsequent requests.

Defining the Parser Function 🧩

				
					def parser(data):
    comment = data['comments']
    
    for cm in comment:
        com = cm['share_info']['desc']
        
        if com == "":
            com = cm['text']
            
        comments.append(com)
        print(com)
    return data

				
			
  • The parser function extracts the comments from the JSON data returned by the TikTok API.

    • data['comments']: We access the list of comments in the response.
    • We loop through each comment and extract the comment text. If the desc (description) field is empty, we fall back to the text field.
    • Each comment is appended to the comments list and printed.

Making the API Request πŸ”„

				
					def req(post_id, curs):
    url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&...&cursor={curs}&aweme_id={post_id}"
    response = requests.request("GET", url, headers=headers)
    
    info = response.text
    raw_data = json.loads(info)
    
    return raw_data

				
			

The req function sends a GET request to TikTok’s API to fetch the comments.

  • The URL contains multiple parameters like the post_id (the video ID) and curs (the cursor for pagination).
  • response.text contains the raw JSON response from the server.
  • json.loads(info) converts the raw JSON string into a Python dictionary.

Loop to Keep Scraping πŸŒ€

				
					while 1:
    data = req(post_id, curs)
    same_data = parser(data)

    if same_data['has_more'] == 1:
        curs += 20
        print('moving to the next cursor')
    else:
        break

				
			

We use a while loop to continuously scrape comments:

  • The req function fetches the data for the current cursor position.
  • The parser function extracts the comments.
  • If the has_more field in the response is 1, it means there are more comments to fetch, so we increment the cursor by 20 to move to the next set of comments.
  • If no more comments are available, the loop breaks.

Saving the Comments to a File πŸ’Ύ

				
					with open('output.json', 'w', encoding='utf-8') as f:
    json.dump(comments, f, ensure_ascii=False, indent=4)

				
			

Once all the comments are scraped, we save them to a JSON file (output.json). We ensure the comments are saved in a readable format with indentation using indent=4. The ensure_ascii=False ensures that non-ASCII characters are properly encoded (e.g., emojis or special characters).

the complete code πŸ“

In this tutorial, you learned how to scrape TikTok comments using Python and the requests library. Here’s a quick recap:

  1. Importing libraries: We used requests for HTTP requests and json for handling TikTok’s JSON response.
  2. Setting up headers: We mimicked a browser using HTTP headers to avoid being blocked.
  3. Fetching and parsing comments: We used a loop and a cursor system to scrape comments in batches.
  4. Saving the data: Finally, we saved the scraped comments into a JSON file.
				
					import requests,json


post_url = 'https://www.tiktok.com/@pythonblyat5/video/7347374025221393696'

post_id = post_url.split('/')[-1]

headers = {
  'accept': '*/*',
  'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
  'cache-control': 'no-cache',
  'pragma': 'no-cache',
  'priority': 'u=1, i',
  'referer': 'https://www.tiktok.com/explore',
  'sec-ch-ua': '"Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"Windows"',
  'sec-fetch-dest': 'empty',
  'sec-fetch-mode': 'cors',
  'sec-fetch-site': 'same-origin',
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

comments = []
comments.append({'post_url':post_url})

curs = 0

def parser(data):
    comment = data['comments']

    for cm in comment:
        com = cm['share_info']['desc']

        if com == "":
            com = cm['text']

        comments.append(com)
        print(com)
    return data


def req(post_id,curs):
    url = f"https://www.tiktok.com/api/comment/list/?WebIdLastTime=1727614026&aid=1988&app_language=en&app_name=tiktok_web&aweme_id={post_id}&browser_language=en-US&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F129.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor={curs}&data_collection_enabled=true&device_id=7420045675172267552&device_platform=web_pc&focus_state=true&from_page=video&history_len=9&is_fullscreen=false&is_page_visible=true&odinId=7420045705102722080&os=windows&priority_region=&referer=&region=GB&screen_height=864&screen_width=1536&tz_name=Asia%2FTehran&user_is_login=false&webcast_language=en&msToken=LACQZNSB6gTZGDBhIqrVKgoO19z0F5AUS9BgaOy6Y0PI5qokvf8CrE01SHTkGr296aSEazAVJNBRMo3E3MrB2Dz4dEIMfNjmfFdz1Gra7Kb3BZpNrZesv2bwtBjLEQqf572Gz3xrA-7ju9L7y-p1z0vmCRc=&X-Bogus=DFSzswVOGPUANtN1t69wzMSwXQM1&_signature=_02B4Z6wo00001xqLMzgAAIDDkLHDRCQFA1saizeAAKBlda"
    response = requests.request("GET", url, headers=headers)

    info = response.text
    raw_data = json.loads(info)
 
    return raw_data    

while 1:

    data = req(post_id,curs)
    same_data = parser(data)

    if same_data['has_more'] == 1:
        curs+=20
        print('moving to the next curser')
    else:
        break
    

with open('output.json', 'w', encoding='utf-8') as f:
    json.dump(comments, f, ensure_ascii=False, indent=4)

print("\ndata has been saved...., good job amir")
				
			

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *