Find & Count Codes in Scanned PDFs — Even on Stained or Marked Pages
Real-world documents are messy: coffee stains, highlighter marks, fax lines, faint photocopies. Our OCR was built for exactly these — it reads the codes anyway, then finds and counts them for you.
Most search tools assume a clean, perfect document. Reality looks different. The invoice you need to check has a stamp over part of it. The shipping manifest came through a fax machine three times. The parts list is a tenth-generation photocopy with toner streaks. A standard Ctrl+F won't even open these — they're scanned images with no text inside. And even the OCR built into big-brand PDF tools often gives up on imperfect pages.
PDF Everyday's OCR search was designed around messy, real documents — the kind that pile up in accounting, logistics and procurement. Here's how it works and what you can do with it.
How the OCR reads a damaged page
When you upload a scanned PDF, three things happen behind the scenes:
- Image preprocessing. Before reading, each page is converted to grayscale and its contrast is boosted, then sharpened. This makes faint photocopies darker, separates text from background stains, and rescues characters that a raw scan would blur together.
- Optical Character Recognition. The cleaned image is read by an OCR engine that reconstructs the actual letters and digits from their shapes — the same way you can still read a word with a coffee ring across it.
- Smart normalization. The recognized text is normalized so that punctuation, spacing and common scanning artifacts don't block a match. This is the part that makes code-hunting reliable.
Why "smart normalization" matters for codes
Codes are printed inconsistently and scanned imperfectly. The same part number might appear as 0450906508HWS on one document and 0.450.906.508 HWS on another — and the scan might add a stray dot or merge a space. Our search ignores dots, dashes, slashes, underscores and spaces inside codes, so all of these match a single search:
| You type | It still finds |
|---|---|
| 0450906508HWS | 0.450.906.508 HWS · 0450 906 508 HWS |
| F026400683003 | F026-400-683-003 · F026 400 683 003 |
| 1987302501EWM | 1987.302.501 EWM |
You don't have to know how the code was printed — you just type the digits and letters, and OCR plus normalization does the rest.
🔍 Try it on your toughest document
Upload a stained, faxed or photocopied PDF and search for a code. Matches appear page by page as the engine scans — free, no sign-up.
Search a Scanned PDF →What you can do with it
1. Find a single code instantly
Drop in a 200-page scanned batch, type the invoice or part number, and jump straight to the page it's on. No scrolling, no eyeballing.
2. Count how many times a code appears
Searching returns every page where a code occurs, so you can count occurrences across the whole document — useful for verifying quantities, checking how often a part is referenced, or confirming a code only appears where it should.
3. Check many codes at once
Enter up to 20 codes separated by spaces. The engine reports which ones were found and on which pages, and which were missing — perfect for reconciling a list against a scanned document.
4. Cross-reference across documents
Run the same code list against different scanned files to confirm a part appears in the order, the invoice and the delivery note — catching mismatches that are easy to miss by hand.
Who uses this
- Accountants & bookkeepers — locate and count invoice numbers across scanned ledgers and receipt batches.
- Logistics & customs teams — find waybill, container and HS codes in faxed or scanned shipping paperwork.
- Procurement & warehouse staff — verify part numbers and count references in scanned catalogs and packing lists.
- Insurance & legal — search marked-up, stamped or annotated scanned files for reference IDs and clause numbers.
Why it beats Ctrl+F and most online PDF tools
Ctrl+F only works on digital text — it finds nothing in a scan. Most popular online PDF suites focus on merging and converting and either lack OCR search entirely or only handle clean scans. PDF Everyday combines aggressive image cleanup, OCR, and code-aware normalization specifically so it keeps working when the document is far from perfect.
Frequently asked questions
Can it really read a stained or marked-up scan?
Yes. Pages are contrast-boosted and sharpened before OCR, which recovers text from faint photocopies, fax lines and many stains. Heavily destroyed areas may still fail, but partial marks and smudges usually don't stop a match.
Can I count how many times a code appears?
Yes. The search lists every page where each code is found, so you can count occurrences across the whole document.
How many codes can I search at once?
Up to 20 at a time, separated by spaces. You'll see which were found and where, and which were not.
Does the formatting of the code matter?
No. Dots, dashes, slashes and spaces inside codes are ignored, so you don't need to match the exact printed format.
Are my documents kept private?
Yes. Files are processed in memory and deleted immediately after the search — nothing is stored or shared.