DEV Community
•
2026-04-09 09:05
Marker, hosted: a scientific PDF parser API with LaTeX equations preserved
The problem
I kept hitting the same wall when building RAG pipelines over research papers: every generic PDF parser I tried mangled the equations.
Adobe Extract, AWS Textract, pdfplumber, PyMuPDF — they all collapse display math into plain-text garbage. Attention(Q,K,V) = softmax(QKT / √dk) V becomes something like:
QKT √dk
Attention(Q,K,V ) = softmax(
)V (1)
Unusable. Your ...