How to Scrape YouTube Comments With Python
- Three working routes: the official YouTube Data API v3 (
commentThreads.list, needs an API key), the no-key youtube-comment-downloader library, and a managed scraper API that returns parsed JSON. - The Data API is the cleanest path. A
commentThreads.listcall costs 1 quota unit and the free project ceiling is 10,000 units/day, so roughly 10,000 list pages of 100 comments each. - No API key? I ran
youtube-comment-downloaderagainst a live video in June 2026 and it returned 3 comments with author, votes, time and reply data in seconds. - For continuous, multi-video pulls the quota cap and the scroll-parsing maintenance add up, so a hosted endpoint that takes a video URL and returns JSON is usually the cheaper path.
I scraped the comments off a YouTube video four different ways this week to settle which method I would actually reach for. The short version: a plain requests and BeautifulSoup scrape returns an almost empty page, because YouTube loads comments with JavaScript after the fact. The three approaches that returned real data were the official Data API, a no-key library called youtube-comment-downloader, and a managed scraper API.
Below is the exact Python I ran for each, what came back, and the point where each one stops being worth the effort.
What is the easiest way to scrape YouTube comments with Python?
The easiest way to scrape YouTube comments with Python is the official YouTube Data API v3, because it returns clean, structured JSON and a single commentThreads.list call costs only 1 quota unit. You request a key once, then page through comments without parsing any HTML or driving a browser.
Here are the three routes that work, with the tradeoff that decides between them:
| Method | Needs API key | Output | Daily ceiling | Breaks when |
|---|---|---|---|---|
| YouTube Data API v3 | Yes (free) | Structured JSON | 10,000 quota units | You exceed the quota |
| youtube-comment-downloader | No | JSON per comment | None enforced | YouTube changes internal feed |
| Selenium + BeautifulSoup | No | Whatever you parse | None enforced | DOM markup changes |
| Managed scraper API | Provider key | Parsed JSON | Plan-dependent | Rarely (handled server-side) |
The Data API is the route I recommend first for almost everyone, so I will start there, then cover the no-key library for when you cannot or do not want to register a Google project, and finish with the managed option for continuous collection. If you want the broader picture of every YouTube data type, I keep a running guide on how to scrape YouTube.
How do you scrape YouTube comments with the YouTube Data API?
You scrape YouTube comments with the YouTube Data API by calling the commentThreads.list endpoint with a video ID and an API key, which returns top-level comments as JSON. This is the method Google sanctions, and it is the one I trust for anything I need to repeat.
First, get a key. Create a project in the Google Cloud Console, enable YouTube Data API v3, and create an API key credential. There is no OAuth step for reading public comments, just the key. Then install the official client:
pip install google-api-python-client
The script below pulls the top-level comments for one video and pages through every result. The videoId is the v= value from a watch URL.
from googleapiclient.discovery import build
API_KEY = "YOUR_API_KEY"
VIDEO_ID = "dQw4w9WgXcQ"
youtube = build("youtube", "v3", developerKey=API_KEY)
def get_comments(video_id):
comments = []
request = youtube.commentThreads().list(
part="snippet",
videoId=video_id,
maxResults=100, # 100 is the maximum per page
order="relevance", # or "time" for newest first
textFormat="plainText",
)
while request is not None:
response = request.execute()
for item in response["items"]:
top = item["snippet"]["topLevelComment"]["snippet"]
comments.append({
"author": top["authorDisplayName"],
"text": top["textDisplay"],
"likes": top["likeCount"],
"published": top["publishedAt"],
})
# nextPageToken drives pagination until it is gone
request = youtube.commentThreads().list_next(request, response)
return comments
rows = get_comments(VIDEO_ID)
print(f"pulled {len(rows)} top-level comments")
print(rows[0])
The mechanics worth knowing: maxResults caps at 100 per page, and list_next follows the nextPageToken until the API stops returning one, which is how you page through every comment on a busy video. The order parameter takes relevance or time. Each page is one commentThreads.list call, and the official documentation confirms that call costs 1 quota unit and accepts maxResults values from 1 to 100.
What does a commentThreads.list call cost in quota?
A commentThreads.list call costs 1 quota unit, and a default Google Cloud project is granted 10,000 units per day, per the YouTube Data API getting-started guide. Because each call returns up to 100 comments for that single unit, the math is generous: about 1,000,000 top-level comments a day before you hit the ceiling, if comment pages are all you spend units on.
| Call | Quota cost | Returns |
|---|---|---|
commentThreads.list | 1 unit | Up to 100 top-level comments + reply preview |
comments.list | 1 unit | Replies under one parent comment |
search.list | 100 units | Search results (find video IDs) |
| Daily project ceiling | 10,000 units | Resets at midnight Pacific |
The quota cost table lists both commentThreads.list and comments.list at 1 unit each. The expensive call to watch is search.list at 100 units, so if you are discovering videos by keyword before scraping their comments, that search step burns quota 100x faster than the comment pulls do. The cap resets daily, which matters once you want replies, because replies live behind a second endpoint.
How do you get the replies under a comment?
You get the replies under a comment by calling comments.list with the parentId set to the top-level comment’s ID, because commentThreads.list only returns a partial preview of replies. A thread with two or three replies is covered by the preview, but a thread with fifty needs the dedicated call.
def get_replies(parent_id):
replies = []
request = youtube.comments().list(
part="snippet",
parentId=parent_id,
maxResults=100,
)
while request is not None:
response = request.execute()
for item in response["items"]:
s = item["snippet"]
replies.append({
"author": s["authorDisplayName"],
"text": s["textDisplay"],
"likes": s["likeCount"],
})
request = youtube.comments().list_next(request, response)
return replies
The parent comment ID is the id field on each item from commentThreads.list. Each comments.list page is another 1-unit call, so a video with thousands of deeply-replied threads can still drain the 10,000-unit budget faster than the top-level count suggests. That quota ceiling is the single reason people look past the official API, which is where the no-key library comes in.
Can you scrape YouTube comments without an API key?
Yes, you can scrape YouTube comments without an API key using the youtube-comment-downloader library, which reads the same internal data feed the watch page loads. It calls no official API, so it needs no Google project, no key, and has no quota.
The library is MIT-licensed and published on PyPI. Install it directly:
pip install youtube-comment-downloader
Here is the script I ran against a public video in June 2026:
from youtube_comment_downloader import YoutubeCommentDownloader, SORT_BY_POPULAR
import itertools
downloader = YoutubeCommentDownloader()
generator = downloader.get_comments_from_url(
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
sort_by=SORT_BY_POPULAR, # or SORT_BY_RECENT
)
# get_comments_from_url yields lazily, so slice what you need
for comment in itertools.islice(generator, 3):
print(comment["votes"], "|", comment["time"], "|", comment["text"][:50])
It returned three comments immediately, sorted by popularity, each as a dictionary. The fields it hands back are useful out of the box: author, text, votes, time, time_parsed, cid, heart, reply, replies, channel, and photo. The votes value arrives as a display string like 255 tūkst. (locale-formatted), so plan to normalise it before any numeric analysis. Because the generator is lazy, you control volume with itertools.islice or a simple counter instead of pulling an entire 50,000-comment thread into memory at once.
The honest caveat: this library depends on YouTube’s internal response structure, the same ytInitialData payload the browser uses. When YouTube changes that structure, the library breaks until its maintainer ships a fix. The official API is a contract YouTube supports; this is a feed it can reshape without notice. For a one-off research pull that distinction does not matter. For a pipeline you depend on, it does, which is the same reason a raw Selenium scrape is more fragile still.
How do you scrape YouTube comments with Selenium?
You scrape YouTube comments with Selenium by loading the watch page in a real browser, scrolling to trigger YouTube’s lazy loading, and reading each #content-text element as it appears. This is the heaviest route, and I only reach for it when I need something the API and the library both miss.
The reason plain requests plus BeautifulSoup fails here is that the comment section is rendered by JavaScript after the initial HTML arrives, so the document requests downloads has no comment nodes to parse, a point the Scrapfly YouTube scraping guide makes as well. Selenium drives an actual Chrome instance, so the JavaScript runs and the comments populate.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome()
driver.get("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
time.sleep(4) # let the page settle
last_height, comments = 0, []
for _ in range(20): # scroll up to 20 times
driver.execute_script("window.scrollBy(0, 3000);")
time.sleep(2)
new_height = driver.execute_script("return document.documentElement.scrollHeight")
if new_height == last_height:
break
last_height = new_height
for el in driver.find_elements(By.CSS_SELECTOR, "#content-text"):
comments.append(el.text)
print(f"scraped {len(comments)} comments")
driver.quit()
This works, but it is the most brittle option by a wide margin. The CSS selectors break whenever YouTube ships a layout change, the scroll loop is timing-dependent and flaky on slow connections, and a headless browser is heavy to run at any scale. Selenium also gives you only what is rendered, so reply text and exact vote counts take extra DOM digging. The three DIY methods all share one ceiling: they put the blocking, parsing, and pagination work on you, which is the gap a managed scraper API fills.
How do you scrape YouTube comments at scale without managing all this?
A managed scraper API removes the quota math, the scroll loops, and the feed-parsing maintenance by accepting a YouTube video URL and returning parsed comment JSON, with proxy rotation and anti-bot handled server-side. Paging is a simple cursor: each response carries a next_page_token and a ready-made next_page_url, and you follow it until has_more is false. You send requests and get structured data back, no Google project and no headless browser to babysit.
This is the path I use for continuous collection across many videos. The request shape is a single GET with your key:
curl "https://api.youtubescraperapi.com/api/v1/youtube/comments?url=https://www.youtube.com/watch?v=dQw4w9WgXcQ&api_key=$API_KEY"
That first call returns page one plus the cursor fields. To walk the whole thread in Python, follow the cursor by passing next_page_token back as page_token:
import requests
import os
API_KEY = os.environ["API_KEY"]
ENDPOINT = "https://api.youtubescraperapi.com/api/v1/youtube/comments"
def fetch_all_comments(video_url):
comments = []
params = {"url": video_url, "api_key": API_KEY}
while True:
resp = requests.get(ENDPOINT, params=params, timeout=60)
resp.raise_for_status()
data = resp.json()
comments.extend(data["comments"])
# stop once the cursor runs out
if not data.get("has_more") or not data.get("next_page_token"):
break
# pass the cursor back as page_token for the next page
params = {"page_token": data["next_page_token"], "api_key": API_KEY}
return comments
all_comments = fetch_all_comments("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(f"pulled {len(all_comments)} comments")
Each page comes back with sort_applied (top or newest, set on the first call and carried inside the token) and a results_count, and every comment exposes the same fields the other methods do (author, text, like count, timestamp, reply count) without registering an app, watching a 10,000-unit budget, or rewriting selectors after a YouTube layout change. Our comment endpoint is built for exactly this, and the video endpoint covers titles and metadata if you need the surrounding context. You can get a key and run the request above on a single video to compare the output against the API and the library yourself.
For a one-time pull of a few videos, the no-key library is genuinely the fastest start. Once you are collecting comments daily across dozens or hundreds of videos, the quota cap and the parser maintenance turn into real ongoing cost, and offloading both is usually cheaper than your own time. Before you collect at volume, it is worth knowing where the legal line sits, which I cover in is scraping YouTube legal.
Is it legal to scrape YouTube comments?
Scraping public YouTube comments sits in a contested but generally tolerated space in the US, while YouTube’s own Terms of Service prohibit the automated access most scraping relies on. The two facts coexist, so the answer depends on what you collect and how.
YouTube’s Terms of Service state that you may not “access the Service using any automated means (such as robots, botnets or scrapers) except: (a) in the case of public search engines, in accordance with YouTube’s robots.txt file; (b) with YouTube’s prior written permission; or (c) as permitted by applicable law.” They separately prohibit collecting “any information that might identify a person (for example, harvesting usernames or faces).” Comment authors are identifiable people, and comments carry usernames, so the privacy clause is the one that bites hardest for comment data specifically.
On the US legal side, courts have generally treated scraping publicly accessible data as lawful following hiQ Labs v. LinkedIn, where the Ninth Circuit held that scraping public profiles did not violate the Computer Fraud and Abuse Act. That ruling decided a narrow question about access under one statute. It leaves a platform’s contract terms and data-protection law untouched, so it does not override YouTube’s ToS or privacy obligations like the GDPR when you store EU users’ comments. The practical takeaway: using the official Data API keeps you inside YouTube’s sanctioned access, anonymising or aggregating comment text reduces privacy exposure, and the full nuance lives in my dedicated write-up on is scraping YouTube legal.
FAQ
Yes. The youtube-comment-downloader library reads the same internal data the watch page loads, so it returns comments without a Google API key or quota. I ran it in June 2026 and it returned comment text, author, vote count, timestamp and reply flags. The tradeoff is that it can break when YouTube changes its internal response shape, which the official API and a managed scraper API both insulate you from.
Each commentThreads.list call costs 1 quota unit and returns up to 100 comments, and a default Google Cloud project gets 10,000 units per day. That is about 1,000,000 top-level comments per day before you hit the cap, assuming you only spend units on comment list calls. Replies and other endpoints draw from the same 10,000-unit budget.
Partly. A commentThreads.list call returns top-level comments and a small preview of replies when you request the replies part. To pull every reply on a thread you call comments.list with the parent comment ID, which also costs 1 unit per call. Threads with hundreds of replies need follow-up paged requests.
YouTube's Terms of Service prohibit accessing the Service through automated means except via the API, with written permission, or as allowed by law, and they prohibit collecting information that identifies a person. US courts have treated scraping public data as generally lawful after hiQ v. LinkedIn, but comment authors are people, so the privacy line matters. I cover the detail in is scraping YouTube legal.
YouTube renders comments with JavaScript after the page loads, so the initial HTML that requests fetches contains almost no comment text for BeautifulSoup to parse. You either drive a real browser with Selenium to scroll and load comments, read the internal data feed the way youtube-comment-downloader does, or call an API that returns the comments directly.