Scanned PDF needs to be OCRed first (I have no plan to implement OCR, though PyMuPDF supports OCR through Tesseract). I also noted that some PDF generated by printing from browsers might contain "fake ...