~ / endpoints / Caption API

YouTube Caption Scraper API

Our YouTube caption scraper pulls the timed caption lines from any public video and returns them as clean JSON, so you get the subtitles, the per-line timestamps, and the full caption text from one REST call.

Get a free API keyAll endpoints
1,000
free requests
2.6s
median response
JSON
captions + timestamps
REST
one endpoint
the problem

Why YouTube Caption data is hard to get

Captions are easy to read on the watch page and hard to export in bulk. The official captions.download method needs the youtube.force-ssl scope and edit permission, so it only works for channels you own, and the library fallbacks break the moment an IP is rate-limited or the player markup shifts.

quickstart

The YouTube Caption Scraper API in one request

cURL
curl "https://api.youtubescraperapi.com/api/v1/youtube/transcript?video_id=dQw4w9WgXcQ&api_key=$API_KEY"
Python
import requests

resp = requests.get(
    "https://api.youtubescraperapi.com/api/v1/youtube/transcript",
    params={
        "video_id": "dQw4w9WgXcQ",
        # "format": "segments",   # segments | text | both (default both)
        # "units": "seconds",     # seconds | ms (default seconds)
        "api_key": "YOUR_API_KEY",
    },
)
data = resp.json()

if not data["transcript_available"]:
    # caption-less video: a reason, not an error
    raise SystemExit(f"no captions: {data['reason']}")

print(data["language_name"], "-", data["segment_count"], "caption lines",
      "(source:", data["source"] + ")")

# Write the caption lines out as an SRT file (start + duration -> cue timings)
def to_srt_time(seconds):
    ms = int(round(seconds * 1000))
    h, ms = divmod(ms, 3600000)
    m, ms = divmod(ms, 60000)
    s, ms = divmod(ms, 1000)
    return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"

with open("captions.srt", "w", encoding="utf-8") as f:
    for i, seg in enumerate(data["segments"], start=1):
        start = seg["start"]
        end = start + seg["duration"]
        f.write(f"{i}\n{to_srt_time(start)} --> {to_srt_time(end)}\n{seg['text']}\n\n")
parameters

Parameters

ParameterRequiredDefaultNotes
video_idrequired-The 11-character YouTube video id whose captions you want. Required unless you pass url.
urloptional-A full watch URL such as https://www.youtube.com/watch?v=dQw4w9WgXcQ. The video id is read from it when video_id is omitted.
formatoptionalbothboth, segments, or text. Return the timed caption lines, one combined text string, or both in a single call.
unitsoptionalsecondsseconds or ms. Sets the unit for each caption line's start and duration, to match your SRT/VTT or player tooling.
response

What the YouTube Caption Scraper API returns

200 OK
{
  "video_id": "dQw4w9WgXcQ",
  "language": "en",
  "language_name": "English",
  "is_generated": false,
  "source": "native",
  "format": "both",
  "units": "seconds",
  "segment_count": 2,
  "word_count": 12,
  "char_count": 56,
  "segments": [
    { "start": 18.8, "duration": 3.2, "text": "We're no strangers to love" },
    { "start": 22.0, "duration": 3.5, "text": "You know the rules and so do I" }
  ],
  "text": "We're no strangers to love You know the rules and so do I",
  "available_languages": ["en", "es", "fr", "de", "pt"],
  "transcript_available": true
}
FieldTypeDescription
video_idstringThe video id the captions were scraped from.
languagestringLanguage code of the returned caption track, for example en.
language_namestringHuman-readable name of that language, for example English.
is_generatedbooleanTrue when the track is YouTube's auto-generated (ASR) captions, false for an uploaded subtitle track.
sourcestringnative for a real uploaded caption track, asr for YouTube's auto-generated speech recognition.
formatstringEchoes the requested format: both, segments, or text.
unitsstringThe time unit used for each caption line, seconds or ms.
segment_countintegerNumber of timed caption lines in the segments array.
word_countintegerTotal words across the caption track, handy for token and cost estimates.
char_countintegerTotal characters in the combined caption text.
segmentsarrayThe timed caption lines, each an object of {start, duration, text}. start and duration map to SRT and VTT cue timings. Returned when format is both or segments.
textstringThe full caption text joined into one string, for search or LLM input. Returned when format is both or text.
available_languagesarrayLanguage codes that have a caption track for this video.
transcript_availablebooleanTrue when a caption track came back. False for caption-less videos, paired with a reason instead of an error.
reasonstringPresent only when transcript_available is false: age_restricted, members_only, transcripts_disabled, none_found, or video_unavailable.
use cases

What you can build using the API

>

Build subtitle files

Turn the segments into SRT or VTT files. Each line already has a start and a duration, so the cue timings are done for you.
>

Feed captions to an LLM

Send the joined text field to a model to summarize a video, pull chapters, or answer questions about the spoken content.
>

Caption search and indexing

Store the per-line text with timestamps so users can search inside a video and jump to the exact moment a phrase is said.
>

Repurpose video into articles

Use the full caption text as the draft for blog posts, show notes, and social clips without retyping the audio.
>

Translation pipelines

Check available_languages, scrape the source track, and pass the lines to a translation step to produce subtitles in new languages.
>

Accessibility and compliance

Pull existing captions to review coverage, fix gaps, and keep an auditable record of the subtitle text for each video.
why youtubescraperapi.com

Why teams choose our YouTube Caption Scraper API

We handle the proxy rotation, anti-bot checks, and retries, then parse the track into timed segments whose start and duration map straight onto SRT and VTT cues. The same REST shape returns at a 2.6s median, starts on 1,000 free requests, and bills only successful calls.

*

Auto and uploaded captions

We return uploaded subtitle tracks and, when none exist, fall back to YouTube's auto-generated (ASR) captions. is_generated and source (native or asr) tell you which one you got, so most spoken-word videos come back with captions.
*

Timed segments for SRT and VTT

Each caption line carries a start and a duration, the exact inputs an SRT or VTT cue needs. The units param switches them between seconds and ms.
*

Lines or one text block

The format param returns timestamped segments, a single combined text string, or both in one call, so you skip splitting or stitching the subtitles yourself.
*

Clear no-caption reasons

Caption-less videos come back with transcript_available false and a reason (age_restricted, members_only, transcripts_disabled, none_found, video_unavailable) instead of an error to catch.
*

Multi-language tracks

available_languages lists every caption language on the video so you can request the track you need.
*

Proxy rotation and retries

Rotating residential proxies and anti-bot handling keep caption jobs running through the datacenter-IP blocks that stop local scripts.
comparison

YouTube Caption Scraper API vs the official YouTube API

Our caption scraperDIY (yt-dlp / library)Official YouTube Data API
Captions for videos you do not ownYes, any public videoWorks until rate limited or markup shiftsNo, captions.download needs edit permission
AuthOne api_key query paramNone, but you manage IPs yourselfOAuth youtube.force-ssl scope
Proxies and anti-botHandled for youYou build and rotate themNot applicable, but access is blocked
OutputParsed timed segments + full textRaw timed text you parse yourselfCaption file only for owned videos
SRT / VTT timingsstart + duration per lineManual parse from raw trackFrom your own caption file only
MaintenanceWe track YouTube changesYou fix every breakageTied to OAuth and quota rules
pricing

Start free, scale when ready

PlanPriceBest for
Free1,000 requestsTesting and small jobs
Pro$0.60 / 1kProduction workloads
Pay-as-you-go$0.90 / 1kSpiky or one-off volume

Median response 2.6s. You only pay for successful requests.

FAQ

What does this YouTube caption scraper return?

It returns the caption track of a public video as JSON: the language and language_name, an is_generated flag and a source (native or asr), a segments array of timed caption lines, a single text field with the full caption text, word and character counts, and available_languages. The segments are the subtitles, each with a start, a duration, and the text. A format param lets you ask for the segments, the combined text, or both.

Can I get captions for a video I do not own?

Yes. Our scraper reads the public caption track for any video, so you do not need to own the channel. The official YouTube Data API v3 captions.download method requires the youtube.force-ssl OAuth scope and edit permission on the video, which limits it to your own videos. That ownership gap is the reason a caption scraper exists.

Does it work with auto-generated captions?

Yes. When a video has no uploaded subtitle track, the scraper falls back to YouTube's auto-generated (ASR) captions, so most spoken-word videos still return text. The is_generated field is true and source is asr for an auto-generated track, while a human-uploaded track returns is_generated false and source native, so you can tell the two apart in your data. If a video has no captions at all, you get transcript_available false with a reason rather than an error.

How do I turn the response into an SRT or VTT file?

Each item in segments has a start and a duration. With the default units of seconds, the SRT cue start is the start value and the cue end is start plus duration, formatted as HH:MM:SS,mmm. VTT uses the same timings with a dot before the milliseconds. The Python sample on this page writes a full SRT file from one response.

Can I scrape captions in other languages?

The available_languages array lists every language code that has a caption track for the video. Pick the language you need and scrape that track. To build subtitles in a language the video does not have, scrape the source captions and pass the text to a translation step.

How do I authenticate?

Pass your key as the api_key query parameter on each request. One key works across every endpoint, with no OAuth flow to set up.

What does the YouTube caption scraper cost?

New accounts get 1,000 free requests, Pro usage is about $0.60 per 1,000 successful requests, and pay-as-you-go top-ups are $0.90 per 1,000. There are no per-seat fees.

Get caption api as JSON
Free plan, 1,000 requests. No credit card required.
Get a free API key All endpoints