Document Image Binarization
There are two parts in this repo. The first part is written in python, which enable a simple binarization algorithm.
The second part uses an improved contrast maximization version of Niblack/Sauvola et al’s method to binarize document images. It is also able to perform the more classical Niblack as well as Sauvola et al. methods. Details can be found in the ICPR 2002 paper.
The first algorithem is easy enough, convert the colorful picture into ‘L’ model, then check the RGB value with an alpha=127. Main code can be found here:
import os, sys from PIL import Image inFile = '' outFile = '' if len(sys.argv) != 3: print 'Input format error!' else: inFile = sys.argv outFile = sys.argv im = Image.open(inFile).convert('L') for i in range(im.size): for j in range(im.size): if im.getpixel((i,j)) > 127: im.putpixel((i,j), 255) else: im.putpixel((i,j), 0) im.show() im.save(outFile)
In order to run this code, run
python binsimple.py sample.jpg res.jpg
sample.jpg is the source picture, res.jpg is the result after execution.
Here is an example:
The source code please refer the repo in Github, 0x333333/binarizewolfjolion.
In the README.md there are detailed information about how to use and compiler.
Here are three examples: