This project is read-only.

OCR mixes characters 'A' and 'Å'


Using SubExtractor 1032d, I tried to OCR subs for The Muppets (Sup-file here).
When all OCR is finished and SRT is saved and examined I conclude that around half of all 'A' have been incorrectly OCR'ed as 'Å'. Some 'Å' have been OCR'ed as 'A'.
(SRT in Notepad++, se line 149), (Sup viewed in SupRip)

I removed OcrMap*.bin from AppData\local and exe folder before launching SubExtractor.

I have tried converting the Sup file into idx/sub (using BDSup2Sub) before OCR'ing, but the end result is (approximately) the same, 'A' and 'Å' are still mixed up.

file attachments


crmeadowcroft wrote Aug 15, 2013 at 4:15 PM

I've reproduced the problem. I'm looking into a fix.

wrote Aug 15, 2013 at 4:15 PM

jofafrazze wrote Aug 16, 2013 at 10:18 AM

Thanks for your efforts and for the best subtitle OCR program, by far!