Welcome to this exciting Python tutorial! Today, we’re diving into how to scrape incredible offers from the Digikala API, one of Iran’s top online retailers. Whether you’re just starting out or looking to enhance your skills, this step-by-step guide will walk you through each part of the process. 📦💻
If you’d like, you can watch the video tutorial along with this blog.
Let’s kick things off by importing the necessary libraries and setting up our base URL. This URL will serve as the foundation for our API requests:
import httpx
import json
base_url = "https://api.digikala.com/v1/incredible-offers/products/?page="
httpx
: Handles our HTTP requests.json
: Manages JSON data.base_url
: The base URL with a placeholder for the page number.Before diving into pagination, let’s fetch data from the first page to understand the API’s response structure and find out how many pages of data are available:
response = httpx.get(base_url + "1", timeout=httpx.Timeout(30.0))
data = json.loads(response.text)
total_pages = data['data']['pager']['total_pages']
httpx.get
: Requests data from the API.json.loads
: Converts JSON response to a Python dictionary.total_pages
: Gets the total number of pages from the response.With the total number of pages in hand, we can now loop through each page to collect all available data:
parsed_data = []
for page in range(1, total_pages + 1): # +1 to include the last page
url = base_url + str(page)
print(f"Working on {url}")
response = httpx.get(url, timeout=httpx.Timeout(30.0))
data = json.loads(response.text)
products = data['data']['products']
total_pages
.url
: Constructs the URL for the current page.print
: Shows the current page URL for tracking.For each page, we need to extract relevant product details and append them to our list:
for pro in products:
parsed_data.append({
'id': pro['id'],
'title': pro['title_fa'],
'product_url': pro['url'],
'images': pro['images'],
'colors': pro['colors'],
'org_price': str(pro['default_variant']['price']['rrp_price']) + ' toman' if pro['default_variant'] else 'None',
'incredible_price': str(pro['default_variant']['price']['selling_price']) + ' toman' if pro['default_variant'] else 'None'
})
parsed_data
: List for storing product information.Finally, we’ll save the collected data to a JSON file, making it easy to access and analyze later:
filename = 'digikala.json'
with open(filename, 'w', encoding='utf-8') as file:
json.dump(parsed_data, file, ensure_ascii=False, indent=4)
print("Data has been saved to " + filename)
json.dump
: Writes the data to a JSON file.filename
: Specifies the name of the file.Congratulations! 🎉 You’ve just learned how to scrape data from the Digikala API using Python. Here’s a quick recap:
Here’s the full code for you to copy and use:
import httpx
import json
# Base URL for API requests
base_url = "https://api.digikala.com/v1/incredible-offers/products/?page="
# Fetch data from the first page to determine the number of pages
response = httpx.get(base_url + "1", timeout=httpx.Timeout(30.0))
data = json.loads(response.text)
total_pages = data['data']['pager']['total_pages']
# List to store parsed data
parsed_data = []
# Loop through all pages and collect data
for page in range(1, total_pages + 1): # +1 to include the last page
url = base_url + str(page)
print(f"Working on {url}")
response = httpx.get(url, timeout=httpx.Timeout(30.0))
data = json.loads(response.text)
products = data['data']['products']
for pro in products:
parsed_data.append({
'id': pro['id'],
'title': pro['title_fa'],
'product_url': pro['url'],
'images': pro['images'],
'colors': pro['colors'],
'org_price': str(pro['default_variant']['price']['rrp_price']) + ' toman' if pro['default_variant'] else 'None',
'incredible_price': str(pro['default_variant']['price']['selling_price']) + ' toman' if pro['default_variant'] else 'None'
})
# Save data to a JSON file
filename = 'digikala.json'
with open(filename, 'w', encoding='utf-8') as file:
json.dump(parsed_data, file, ensure_ascii=False, indent=4)
print("Data has been saved to " + filename)