There are two parts in this repo. The first part is written in python, which enable a simple binarization algorithm.

The second part uses an improved contrast maximization version of Niblack/Sauvola et al’s method to binarize document images. It is also able to perform the more classical Niblack as well as Sauvola et al. methods. Details can be found in the ICPR 2002 paper.

Usage:

Simple Binarization Algorithm

The first algorithem is easy enough, convert the colorful picture into ‘L’ model, then check the RGB value with an alpha=127. Main code can be found here:

import os, sys
from PIL import Image

inFile = ''
outFile = ''

if len(sys.argv) != 3:
    print 'Input format error!'
else:
    inFile = sys.argv[1]
    outFile = sys.argv[2]

im = Image.open(inFile).convert('L')

for i in range(im.size[0]):
    for j in range(im.size[1]):
        if im.getpixel((i,j)) > 127:
            im.putpixel((i,j), 255)
        else:
            im.putpixel((i,j), 0)

im.show()
im.save(outFile)

In order to run this code, run

python binsimple.py sample.jpg res.jpg

sample.jpg is the source picture, res.jpg is the result after execution.

Here is an example:

pic

Figure1: Origin picture

pic

Figure2: Simple algorithm

Improved Contrast Maximization Algorithms

The source code please refer the repo in Github, 0x333333/binarizewolfjolion.

In the README.md there are detailed information about how to use and compiler.

Here are three examples:

pic

Figure3: Origin picture

pic

Figure4: Algorithm Wolf et al. (2001) needs black text on white background

pic

Figure5: Algorithm Sauvola et al. (1997) needs black text on white background

pic

Figure6: Algorithm Niblack (1986) needs white text on black background