extract-data

Here are 296 public repositories matching this topic...

opendatalab / MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

python pdf parser ocr pdf-converter extract-data document-analysis pdf-parser layout-analysis ai4science pdf-extractor-rag pdf-extractor-llm pdf-extractor-pretrain

Updated Feb 9, 2026
Python

pymupdf / PyMuPDF

Star

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated Feb 17, 2026
Python

bda-research / node-crawler

Star

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

nodejs javascript jquery crawler spider cheerio extract-data

Updated May 28, 2025
TypeScript

meltano / meltano

Star

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Updated Feb 17, 2026
Python

DocumindHQ / documind

Star

Open-source platform for extracting structured data from documents using AI.

open-source pdf parser ocr ai pdf-converter developer-tools extract-data document-analysis pdf-extractor document-extraction llms pdf-extractor-llm

Updated May 15, 2025
JavaScript

elixir-crawly / crawly

Star

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler scraper erlang elixir spider scraping crawling extract-data scraping-websites

Updated Jul 16, 2025
Elixir

slotix / dataflowkit

Star

Extract structured data from web sites. Web sites scraping.

go golang scraper headless scraping crawling golang-library extract-data scraping-websites cdp chrome-fetcher

Updated Mar 7, 2023
Go

OmkarPathak / ResumeParser

Sponsor

Star

A simple resume parser used for extracting information from resumes

python parser gui python3 extract-data resume-parser

Updated Feb 4, 2026
HTML

danschultzer / receipt-scanner

Sponsor

Star

Receipt scanner extracts information from your PDF or image receipts - built in NodeJS

ocr extract-information extract-data optical-character-recognition receipts receipt-scanner

Updated Nov 18, 2018
JavaScript

m92vyas / llm-reader

Star

Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.

scraper scraping extract-data webscraping scraping-websites ai-agents rag llm jinaai llm-agent ai-web-scraper firecrawl ai-agent-tools

Updated Feb 14, 2026
Python

Qusic / TraceUtility

Star

Extract data from .trace documents generated by Instruments

xcode reverse-engineering instruments profiling extract-data

Updated Sep 21, 2020
Objective-C

AdemBoukhris457 / Doctra

Star

📄🔍 Parse, extract, and analyze documents with ease 📄🔍

python ocr ai gemini openai extract-data document-analysis image-restoration vlm pdf-parser pdf2markdown documentparsing

Updated Nov 29, 2025
Jupyter Notebook

itehax / rust-scraping

Star

Web scraping using rust !

rust web scraping extract-data bypass

Updated Oct 25, 2023
Rust

rodricios / wxpath

Sponsor

Star

wxpath - declarative web crawling with XPath; a Web Query Language (WQL)

crawler data-mining scraper data-extraction extract-data data-loader rag web-graph-traversal web-query-language

Updated Feb 3, 2026
Python

yuanxu-li / html-table-extractor

Star

extract data from html table

html crawler table scraping beautifulsoup extract-data html-table

Updated May 1, 2020
Python

ropensci / smapr

Star

An R package for acquisition and processing of NASA SMAP data

r nasa raster rstats acquisition r-package soil-moisture extract-data soil-moisture-sensor soil-mapping smap-data peer-reviewed

Updated Nov 29, 2025
R

msoap / html2data

Star

Library and cli for extracting data from HTML via CSS selectors

html cli homebrew golang parser library css-selector scrapping extract-data

Updated Aug 22, 2025
Go

isaacmg / fb_scraper

Star

FBLYZE is a Facebook scraping system and analysis system.

kafka spark tf-idf flink extract-data facebook-scraper

Updated Apr 28, 2021
Jupyter Notebook

Techcatchers / PyLyrics-Extractor

Star

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

python-library search-algorithm extract-data lyrics-fetcher

Updated Jan 11, 2024
Python

fivesmallq / web-data-extractor

Star

Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.

spider xpath extract-data jsonpath jquery-selector

Updated Jan 22, 2024
Java

Improve this page

Add a description, image, and links to the extract-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the extract-data topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract-data

Here are 296 public repositories matching this topic...

opendatalab / MinerU

pymupdf / PyMuPDF

bda-research / node-crawler

meltano / meltano

DocumindHQ / documind

elixir-crawly / crawly

slotix / dataflowkit

OmkarPathak / ResumeParser

danschultzer / receipt-scanner

m92vyas / llm-reader

Qusic / TraceUtility

AdemBoukhris457 / Doctra

itehax / rust-scraping

rodricios / wxpath

yuanxu-li / html-table-extractor

ropensci / smapr

msoap / html2data

isaacmg / fb_scraper

Techcatchers / PyLyrics-Extractor

fivesmallq / web-data-extractor

Improve this page

Add this topic to your repo