Show HN: Free API to extract PDF data

Share This Post

Hi HN,

Like everyone, I’m working on an product that uses LLMs to extract data from photos and documents. Part of the processing pipeline is extracting data from PDFs as raw text or a raster image.

As part of our leadgen strategy, we’ve opened our REST API that lets you process pages of a PDF. The API is completely free to use anonymously, but is rate limited to 1 page per 30 seconds. Creating a free account removes this restriction.

The two endpoints are:

https://extract.dev/api/pages/extract/raster – Rasterize a page of a PDF

https://extract.dev/api/pages/extract/text – Extract text from a page of a PDF

Both have the same request format:

    {
        "file": "https://assets.extract-cdn.com/data/hd-receipt.pdf",
        "page": 1
    }

I’ve outlined more of the documentation here: https://extract.dev/docs

Under the hood, the API is using Poppler to extract text and rasterize pages. Note that the text extraction functionality extracts actual text encoded in the PDF, and does not employ an OCR model. Give it a spin, I’m interested in your feedback if this is useful or not.


Comments URL: https://news.ycombinator.com/item?id=45581760

Points: 1

# Comments: 0

Source: news.ycombinator.com

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Windows Securitym Hackers Feeds

OpenAI hires black hole physicist in broader science push

Article URL: https://www.axios.com/2025/10/16/openai-science-black-hole-physicist Comments URL: https://news.ycombinator.com/item?id=45605633 Points: 1 # Comments: 0 Source: www.axios.com

Do You Want To Boost Your Business?

drop us a line and keep in touch

We are here to help

One of our technicians will be with you shortly.