YouTube Caption Scraper API
Our YouTube caption scraper pulls the timed caption lines from any public video and returns them as clean JSON, so you get the subtitles, the per-line timestamps, and the full caption text from one REST call.
Why YouTube Caption data is hard to get
Captions are easy to read on the watch page and hard to export in bulk. The official captions.download method needs the youtube.force-ssl scope and edit permission, so it only works for channels you own, and the library fallbacks break the moment an IP is rate-limited or the player markup shifts.
The YouTube Caption Scraper API in one request
curl "https://api.youtubescraperapi.com/api/v1/youtube/transcript?video_id=dQw4w9WgXcQ&api_key=$API_KEY" import requests
resp = requests.get(
"https://api.youtubescraperapi.com/api/v1/youtube/transcript",
params={
"video_id": "dQw4w9WgXcQ",
# "format": "segments", # segments | text | both (default both)
# "units": "seconds", # seconds | ms (default seconds)
"api_key": "YOUR_API_KEY",
},
)
data = resp.json()
if not data["transcript_available"]:
# caption-less video: a reason, not an error
raise SystemExit(f"no captions: {data['reason']}")
print(data["language_name"], "-", data["segment_count"], "caption lines",
"(source:", data["source"] + ")")
# Write the caption lines out as an SRT file (start + duration -> cue timings)
def to_srt_time(seconds):
ms = int(round(seconds * 1000))
h, ms = divmod(ms, 3600000)
m, ms = divmod(ms, 60000)
s, ms = divmod(ms, 1000)
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
with open("captions.srt", "w", encoding="utf-8") as f:
for i, seg in enumerate(data["segments"], start=1):
start = seg["start"]
end = start + seg["duration"]
f.write(f"{i}\n{to_srt_time(start)} --> {to_srt_time(end)}\n{seg['text']}\n\n") Parameters
| Parameter | Required | Default | Notes |
|---|---|---|---|
video_id | required | - | The 11-character YouTube video id whose captions you want. Required unless you pass url. |
url | optional | - | A full watch URL such as https://www.youtube.com/watch?v=dQw4w9WgXcQ. The video id is read from it when video_id is omitted. |
format | optional | both | both, segments, or text. Return the timed caption lines, one combined text string, or both in a single call. |
units | optional | seconds | seconds or ms. Sets the unit for each caption line's start and duration, to match your SRT/VTT or player tooling. |
What the YouTube Caption Scraper API returns
{
"video_id": "dQw4w9WgXcQ",
"language": "en",
"language_name": "English",
"is_generated": false,
"source": "native",
"format": "both",
"units": "seconds",
"segment_count": 2,
"word_count": 12,
"char_count": 56,
"segments": [
{ "start": 18.8, "duration": 3.2, "text": "We're no strangers to love" },
{ "start": 22.0, "duration": 3.5, "text": "You know the rules and so do I" }
],
"text": "We're no strangers to love You know the rules and so do I",
"available_languages": ["en", "es", "fr", "de", "pt"],
"transcript_available": true
} | Field | Type | Description |
|---|---|---|
video_id | string | The video id the captions were scraped from. |
language | string | Language code of the returned caption track, for example en. |
language_name | string | Human-readable name of that language, for example English. |
is_generated | boolean | True when the track is YouTube's auto-generated (ASR) captions, false for an uploaded subtitle track. |
source | string | native for a real uploaded caption track, asr for YouTube's auto-generated speech recognition. |
format | string | Echoes the requested format: both, segments, or text. |
units | string | The time unit used for each caption line, seconds or ms. |
segment_count | integer | Number of timed caption lines in the segments array. |
word_count | integer | Total words across the caption track, handy for token and cost estimates. |
char_count | integer | Total characters in the combined caption text. |
segments | array | The timed caption lines, each an object of {start, duration, text}. start and duration map to SRT and VTT cue timings. Returned when format is both or segments. |
text | string | The full caption text joined into one string, for search or LLM input. Returned when format is both or text. |
available_languages | array | Language codes that have a caption track for this video. |
transcript_available | boolean | True when a caption track came back. False for caption-less videos, paired with a reason instead of an error. |
reason | string | Present only when transcript_available is false: age_restricted, members_only, transcripts_disabled, none_found, or video_unavailable. |
What you can build using the API
Build subtitle files
Feed captions to an LLM
Caption search and indexing
Repurpose video into articles
Translation pipelines
Accessibility and compliance
Why teams choose our YouTube Caption Scraper API
We handle the proxy rotation, anti-bot checks, and retries, then parse the track into timed segments whose start and duration map straight onto SRT and VTT cues. The same REST shape returns at a 2.6s median, starts on 1,000 free requests, and bills only successful calls.
Auto and uploaded captions
Timed segments for SRT and VTT
Lines or one text block
Clear no-caption reasons
Multi-language tracks
Proxy rotation and retries
YouTube Caption Scraper API vs the official YouTube API
| Our caption scraper | DIY (yt-dlp / library) | Official YouTube Data API | |
|---|---|---|---|
| Captions for videos you do not own | Yes, any public video | Works until rate limited or markup shifts | No, captions.download needs edit permission |
| Auth | One api_key query param | None, but you manage IPs yourself | OAuth youtube.force-ssl scope |
| Proxies and anti-bot | Handled for you | You build and rotate them | Not applicable, but access is blocked |
| Output | Parsed timed segments + full text | Raw timed text you parse yourself | Caption file only for owned videos |
| SRT / VTT timings | start + duration per line | Manual parse from raw track | From your own caption file only |
| Maintenance | We track YouTube changes | You fix every breakage | Tied to OAuth and quota rules |
Start free, scale when ready
| Plan | Price | Best for |
|---|---|---|
| Free | 1,000 requests | Testing and small jobs |
| Pro | $0.60 / 1k | Production workloads |
| Pay-as-you-go | $0.90 / 1k | Spiky or one-off volume |
Median response 2.6s. You only pay for successful requests.
FAQ
It returns the caption track of a public video as JSON: the language and language_name, an is_generated flag and a source (native or asr), a segments array of timed caption lines, a single text field with the full caption text, word and character counts, and available_languages. The segments are the subtitles, each with a start, a duration, and the text. A format param lets you ask for the segments, the combined text, or both.
Yes. Our scraper reads the public caption track for any video, so you do not need to own the channel. The official YouTube Data API v3 captions.download method requires the youtube.force-ssl OAuth scope and edit permission on the video, which limits it to your own videos. That ownership gap is the reason a caption scraper exists.
Yes. When a video has no uploaded subtitle track, the scraper falls back to YouTube's auto-generated (ASR) captions, so most spoken-word videos still return text. The is_generated field is true and source is asr for an auto-generated track, while a human-uploaded track returns is_generated false and source native, so you can tell the two apart in your data. If a video has no captions at all, you get transcript_available false with a reason rather than an error.
Each item in segments has a start and a duration. With the default units of seconds, the SRT cue start is the start value and the cue end is start plus duration, formatted as HH:MM:SS,mmm. VTT uses the same timings with a dot before the milliseconds. The Python sample on this page writes a full SRT file from one response.
The available_languages array lists every language code that has a caption track for the video. Pick the language you need and scrape that track. To build subtitles in a language the video does not have, scrape the source captions and pass the text to a translation step.
Pass your key as the api_key query parameter on each request. One key works across every endpoint, with no OAuth flow to set up.
New accounts get 1,000 free requests, Pro usage is about $0.60 per 1,000 successful requests, and pay-as-you-go top-ups are $0.90 per 1,000. There are no per-seat fees.