YouTube Transcript Scraper API
Our YouTube transcript API takes a video ID and returns the full transcript as clean JSON: timestamped segments, the language, whether the captions are auto-generated or human, and the complete text in one field. Proxies, anti-bot, and rendering are handled, so one request with your API key is the whole integration.
Why YouTube Transcript data is hard to get
The official YouTube Data API v3 only returns caption text for videos you own: captions.download needs OAuth and edit permission, so it answers a 403 for anyone else’s public video. The open-source libraries that read the public caption track work locally but throw IpBlocked from cloud IPs, which is the gap a hosted YouTube transcript API fills.
The YouTube Transcript Scraper API in one request
curl "https://api.youtubescraperapi.com/api/v1/youtube/transcript?video_id=dQw4w9WgXcQ&api_key=$API_KEY" import requests, os
resp = requests.get(
"https://api.youtubescraperapi.com/api/v1/youtube/transcript",
params={
"video_id": "dQw4w9WgXcQ", # YouTube video id
# "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", # or pass the full watch URL
# "format": "both", # both | segments | text (default both)
# "units": "seconds", # seconds | ms (default seconds)
"api_key": os.environ["API_KEY"],
},
timeout=30,
)
data = resp.json()
if not data["transcript_available"]:
# no captions: a reason, not an error (age_restricted, members_only,
# transcripts_disabled, none_found, video_unavailable)
print("no transcript:", data["reason"])
else:
print(data["language_name"], "| auto-generated:", data["is_generated"], "| source:", data["source"])
print(data["segment_count"], "segments,", data["word_count"], "words")
for seg in data["segments"][:3]:
print(f"{seg['start']:.2f}s {seg['text']!r}")
# full transcript as one string, ready for an LLM
transcript = data["text"] Parameters
| Parameter | Required | Default | Notes |
|---|---|---|---|
video_id | required | - | YouTube video id, e.g. dQw4w9WgXcQ. Required unless you pass url. |
url | optional | - | Full watch URL as an alternative to the id |
format | optional | both | both, segments, or text. Choose timestamped segments, one combined text string, or both. |
units | optional | seconds | seconds or ms. Sets the unit for each segment's start and duration. |
What the YouTube Transcript Scraper API returns
{
"video_id": "dQw4w9WgXcQ",
"language": "en",
"language_name": "English",
"is_generated": false,
"source": "native",
"format": "both",
"units": "seconds",
"segment_count": 61,
"word_count": 487,
"char_count": 2089,
"segments": [
{
"start": 18.64,
"duration": 3.6,
"text": "We're no strangers to love"
},
{
"start": 22.24,
"duration": 3.84,
"text": "You know the rules and so do I"
}
],
"text": "We're no strangers to love You know the rules and so do I ...",
"available_languages": ["en", "es", "fr", "de", "pt", "ja"],
"transcript_available": true
} | Field | Type | Description |
|---|---|---|
video_id | string | The video id the transcript belongs to |
language | string | Language code of the returned transcript, e.g. en |
language_name | string | Human-readable language name, e.g. English |
is_generated | boolean | true for auto-generated (ASR) captions, false for human-uploaded |
source | string | native when the captions are a real uploaded track, asr when they are YouTube's auto-generated speech recognition |
format | string | Echoes the requested format: both, segments, or text |
units | string | The time unit used for each segment, seconds or ms |
segment_count | integer | Number of timed segments in the transcript |
word_count | integer | Total words across the transcript, handy for token and cost estimates |
char_count | integer | Total characters in the combined transcript text |
segments | array | Timed lines, each with start, duration, and text. Returned when format is both or segments. |
text | string | The full transcript joined into one string. Returned when format is both or text. |
available_languages | array | Language codes that have a caption track for this video |
transcript_available | boolean | true when captions came back. false for caption-less videos, paired with a reason instead of an error. |
reason | string | Present only when transcript_available is false: age_restricted, members_only, transcripts_disabled, none_found, or video_unavailable |
What you can build using the API
AI summaries and RAG
Searchable video archives
Subtitles and captions
Content repurposing
Multilingual workflows
Research and analytics
Why teams choose our YouTube Transcript Scraper API
Our YouTube transcript API runs on our infrastructure: rotating residential proxies, anti-bot and CAPTCHA handling, and JS rendering behind one youtube/transcript call that returns validated JSON in about 2.6 seconds. The free plan covers 1,000 requests, and you only pay for successful calls.
Segments or one text block
Auto-caption (ASR) fallback
Clear no-caption reasons
Seconds or milliseconds
Proxy rotation and anti-bot
Pay for success
YouTube Transcript Scraper API vs the official YouTube API
| Approach | Setup | Maintenance | Transcript text | Limits |
|---|---|---|---|---|
requests + parsing | high | constant | yes, if it loads | IpBlocked on cloud IPs |
| Headless browser (Playwright) | high | high | yes | slow, rate-limited |
| YouTube Data API v3 | OAuth setup | low | owner videos only | 403 for others, 10k units/day |
| Our transcript API | one call | none | any public video | handled, pay per success |
Start free, scale when ready
| Plan | Price | Best for |
|---|---|---|
| Free | 1,000 requests | Testing and small jobs |
| Pro | $0.60 / 1k | Production workloads |
| Pay-as-you-go | $0.90 / 1k | Spiky or one-off volume |
Median response 2.6s. You only pay for successful requests.
FAQ
A YouTube transcript API is an endpoint that takes a video ID and returns the video's transcript as structured data. Ours reads the public caption track and gives you back timestamped segments, the language, whether the captions are auto-generated, and the full transcript text in one JSON response, typically in about 2.6 seconds. A format param lets you choose timestamped segments, one combined text string, or both.
No. The YouTube Data API v3 only downloads captions for videos you own, through a captions.download method that requires OAuth with the youtube.force-ssl scope and edit permission on the video. It returns a 403 for any public video you do not control, so it cannot fetch transcripts for arbitrary videos. That gap is the reason a dedicated transcript API exists.
You can get a transcript for any YouTube video that has a caption track, whether the creator uploaded it or YouTube auto-generated it. When no human captions exist, the endpoint falls back to the auto-generated (ASR) track, returning source: "asr" and is_generated: true, so most spoken-word videos work. If a video genuinely has no captions, the response is not an error: it comes back with transcript_available: false and a reason such as none_found, transcripts_disabled, members_only, age_restricted, or video_unavailable.
Each response includes an available_languages array listing the language codes that have a caption track for that video. Read that field to see what exists, then the response also returns the chosen track's language and language_name so you know exactly which transcript you received.
No. You do not need a Google Cloud project, an OAuth flow, or your own proxy pool. Proxy rotation, anti-bot handling, and retries run on our servers. You send one request with your API key and the video ID, and the parsed transcript comes back.
The free plan includes 1,000 requests. Paid usage runs about $0.60 per 1,000 on the Pro plan, or $0.90 per 1,000 pay-as-you-go. Failed requests are never charged, so you only pay for transcripts that actually come back.