~ / endpoints / Caption API

YouTube Caption Scraper API

Our YouTube caption scraper pulls the timed caption lines from any public video and returns them as clean JSON, so you get the subtitles, the per-line timestamps, and the full caption text from one REST call.

Get a free API key All endpoints

1,000

free requests

2.6s

median response

JSON

captions + timestamps

REST

one endpoint

the problem

Why YouTube Caption data is hard to get

Captions are easy to read on the watch page and hard to export in bulk. The official captions.download method needs the youtube.force-ssl scope and edit permission, so it only works for channels you own, and the library fallbacks break the moment an IP is rate-limited or the player markup shifts.

quickstart

The YouTube Caption Scraper API in one request

cURL

curl "https://api.youtubescraperapi.com/api/v1/youtube/transcript?video_id=dQw4w9WgXcQ&api_key=$API_KEY"

Python

import requests

resp = requests.get(
    "https://api.youtubescraperapi.com/api/v1/youtube/transcript",
    params={
        "video_id": "dQw4w9WgXcQ",
        # "format": "segments",   # segments | text | both (default both)
        # "units": "seconds",     # seconds | ms (default seconds)
        "api_key": "YOUR_API_KEY",
    },
)
data = resp.json()

if not data["transcript_available"]:
    # caption-less video: a reason, not an error
    raise SystemExit(f"no captions: {data['reason']}")

print(data["language_name"], "-", data["segment_count"], "caption lines",
      "(source:", data["source"] + ")")

# Write the caption lines out as an SRT file (start + duration -> cue timings)
def to_srt_time(seconds):
    ms = int(round(seconds * 1000))
    h, ms = divmod(ms, 3600000)
    m, ms = divmod(ms, 60000)
    s, ms = divmod(ms, 1000)
    return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"

with open("captions.srt", "w", encoding="utf-8") as f:
    for i, seg in enumerate(data["segments"], start=1):
        start = seg["start"]
        end = start + seg["duration"]
        f.write(f"{i}\n{to_srt_time(start)} --> {to_srt_time(end)}\n{seg['text']}\n\n")

parameters

Parameters

Parameter	Required	Default	Notes
`video_id`	required	-	The 11-character YouTube video id whose captions you want. Required unless you pass url.
`url`	optional	-	A full watch URL such as https://www.youtube.com/watch?v=dQw4w9WgXcQ. The video id is read from it when video_id is omitted.
`format`	optional	`both`	both, segments, or text. Return the timed caption lines, one combined text string, or both in a single call.
`units`	optional	`seconds`	seconds or ms. Sets the unit for each caption line's start and duration, to match your SRT/VTT or player tooling.

response

What the YouTube Caption Scraper API returns

200 OK

{
  "video_id": "dQw4w9WgXcQ",
  "language": "en",
  "language_name": "English",
  "is_generated": false,
  "source": "native",
  "format": "both",
  "units": "seconds",
  "segment_count": 2,
  "word_count": 12,
  "char_count": 56,
  "segments": [
    { "start": 18.8, "duration": 3.2, "text": "We're no strangers to love" },
    { "start": 22.0, "duration": 3.5, "text": "You know the rules and so do I" }
  ],
  "text": "We're no strangers to love You know the rules and so do I",
  "available_languages": ["en", "es", "fr", "de", "pt"],
  "transcript_available": true
}

Field	Type	Description
`video_id`	string	The video id the captions were scraped from.
`language`	string	Language code of the returned caption track, for example en.
`language_name`	string	Human-readable name of that language, for example English.
`is_generated`	boolean	True when the track is YouTube's auto-generated (ASR) captions, false for an uploaded subtitle track.
`source`	string	native for a real uploaded caption track, asr for YouTube's auto-generated speech recognition.
`format`	string	Echoes the requested format: both, segments, or text.
`units`	string	The time unit used for each caption line, seconds or ms.
`segment_count`	integer	Number of timed caption lines in the segments array.
`word_count`	integer	Total words across the caption track, handy for token and cost estimates.
`char_count`	integer	Total characters in the combined caption text.
`segments`	array	The timed caption lines, each an object of {start, duration, text}. start and duration map to SRT and VTT cue timings. Returned when format is both or segments.
`text`	string	The full caption text joined into one string, for search or LLM input. Returned when format is both or text.
`available_languages`	array	Language codes that have a caption track for this video.
`transcript_available`	boolean	True when a caption track came back. False for caption-less videos, paired with a reason instead of an error.
`reason`	string	Present only when transcript_available is false: age_restricted, members_only, transcripts_disabled, none_found, or video_unavailable.

use cases

What you can build using the API

Build subtitle files

Turn the segments into SRT or VTT files. Each line already has a start and a duration, so the cue timings are done for you.

Feed captions to an LLM

Send the joined text field to a model to summarize a video, pull chapters, or answer questions about the spoken content.

Caption search and indexing

Store the per-line text with timestamps so users can search inside a video and jump to the exact moment a phrase is said.

Repurpose video into articles

Use the full caption text as the draft for blog posts, show notes, and social clips without retyping the audio.

Translation pipelines

Check available_languages, scrape the source track, and pass the lines to a translation step to produce subtitles in new languages.

Accessibility and compliance

Pull existing captions to review coverage, fix gaps, and keep an auditable record of the subtitle text for each video.

why youtubescraperapi.com

Why teams choose our YouTube Caption Scraper API

We handle the proxy rotation, anti-bot checks, and retries, then parse the track into timed segments whose start and duration map straight onto SRT and VTT cues. The same REST shape returns at a 2.6s median, starts on 1,000 free requests, and bills only successful calls.

Auto and uploaded captions

We return uploaded subtitle tracks and, when none exist, fall back to YouTube's auto-generated (ASR) captions. is_generated and source (native or asr) tell you which one you got, so most spoken-word videos come back with captions.

Timed segments for SRT and VTT

Each caption line carries a start and a duration, the exact inputs an SRT or VTT cue needs. The units param switches them between seconds and ms.

Lines or one text block

The format param returns timestamped segments, a single combined text string, or both in one call, so you skip splitting or stitching the subtitles yourself.

Clear no-caption reasons

Caption-less videos come back with transcript_available false and a reason (age_restricted, members_only, transcripts_disabled, none_found, video_unavailable) instead of an error to catch.

Multi-language tracks

available_languages lists every caption language on the video so you can request the track you need.

Proxy rotation and retries

Rotating residential proxies and anti-bot handling keep caption jobs running through the datacenter-IP blocks that stop local scripts.

comparison

YouTube Caption Scraper API vs the official YouTube API

	Our caption scraper	DIY (yt-dlp / library)	Official YouTube Data API
Captions for videos you do not own	Yes, any public video	Works until rate limited or markup shifts	No, captions.download needs edit permission
Auth	One api_key query param	None, but you manage IPs yourself	OAuth youtube.force-ssl scope
Proxies and anti-bot	Handled for you	You build and rotate them	Not applicable, but access is blocked
Output	Parsed timed segments + full text	Raw timed text you parse yourself	Caption file only for owned videos
SRT / VTT timings	start + duration per line	Manual parse from raw track	From your own caption file only
Maintenance	We track YouTube changes	You fix every breakage	Tied to OAuth and quota rules

pricing

Start free, scale when ready

Plan	Price	Best for
Free	1,000 requests	Testing and small jobs
Pro	$0.60 / 1k	Production workloads
Pay-as-you-go	$0.90 / 1k	Spiky or one-off volume

Median response 2.6s. You only pay for successful requests.

FAQ

What does this YouTube caption scraper return?

It returns the caption track of a public video as JSON: the language and language_name, an is_generated flag and a source (native or asr), a segments array of timed caption lines, a single text field with the full caption text, word and character counts, and available_languages. The segments are the subtitles, each with a start, a duration, and the text. A format param lets you ask for the segments, the combined text, or both.

Can I get captions for a video I do not own?

Yes. Our scraper reads the public caption track for any video, so you do not need to own the channel. The official YouTube Data API v3 captions.download method requires the youtube.force-ssl OAuth scope and edit permission on the video, which limits it to your own videos. That ownership gap is the reason a caption scraper exists.

Does it work with auto-generated captions?

Yes. When a video has no uploaded subtitle track, the scraper falls back to YouTube's auto-generated (ASR) captions, so most spoken-word videos still return text. The is_generated field is true and source is asr for an auto-generated track, while a human-uploaded track returns is_generated false and source native, so you can tell the two apart in your data. If a video has no captions at all, you get transcript_available false with a reason rather than an error.

How do I turn the response into an SRT or VTT file?

Each item in segments has a start and a duration. With the default units of seconds, the SRT cue start is the start value and the cue end is start plus duration, formatted as HH:MM:SS,mmm. VTT uses the same timings with a dot before the milliseconds. The Python sample on this page writes a full SRT file from one response.

Can I scrape captions in other languages?

The available_languages array lists every language code that has a caption track for the video. Pick the language you need and scrape that track. To build subtitles in a language the video does not have, scrape the source captions and pass the text to a translation step.

How do I authenticate?

Pass your key as the api_key query parameter on each request. One key works across every endpoint, with no OAuth flow to set up.

What does the YouTube caption scraper cost?

New accounts get 1,000 free requests, Pro usage is about $0.60 per 1,000 successful requests, and pay-as-you-go top-ups are $0.90 per 1,000. There are no per-seat fees.

Get caption api as JSON

Free plan, 1,000 requests. No credit card required.

Get a free API key All endpoints