Some days ago, my wife asked me to help her to correct some multiple choice tests and, since I hate to lost my time, I spent some hours in order to write a script that automate the data extrapolation.
After googling some information, I found an interesting article by Adrian Rosebrock that talk about this, so I tried his approach and the result of first test was good:
This code, basically, convert the image in a black and white version and find all contours with findContours
and
grab_contours
. After that, the contours are filtered and ordered in a matrix.
In many cases worked well, but in other cases did not work, this happened because I used irregular tests:
The author use very restrictive controls in order to reduce the number of wrong contours, this is possible with clean cases:
if w >= 20 and h >= 20 and ar >= 0.9 and ar <= 1.1:
questionCnts.append(c)
I tried to do some changes in the code, in particular I allowed a more flexible aspect ratio and I stored the value of
countNonZero
in a matrix in order to determine a threshold value and use it for find the checked boxes, this allowed
to find multiple checked boxes for each line.
I runned the second test and the results was better:
Assuming that the number of the columns and the number of the rows is know, I tried to skip the part that find the contours and I splited the image in the correct number of sectors, the result of third test was greate:
Well, in the real case, the checkboxes are probably in a large sheet with some text and the approach that search the contours probably work well, so I tried and the best result of this fourth test was how I expected:
Of course, with bigger checkboxes or in general with a well formed test the results are better, but my scope is to create a script that works in all the cases with correct configurations.
In the test number five, I added a bounding box as new parameter and all the results have been perfect.
This solution can be a little bit more complex for the user but, for my purpose, is more flexible.
In order to run the tests you can download this folder, create a virtualenv and install requirements:
$ virtualenv .virtualenv --system-site-packages
$ source .virtualenv/bin/activate
$ pip install -r requirements.txt
$ python test1.py
The output images will create in the images/res
folder.
The code snippet made by Adrian Rosebrock works very well with his own test, but is very specific. By identify some parameters the result can be better. The parameters that I identified are:
rows_count = 4
: the number of questions/rows of answerscols_count = 6
: the number of columns for each question/answerthreshold_multiplier = 1.15
: the multiplier used for each row average value in order to define the threshold valuex, y, w, h = 1215, 822, 306, 229
: the bounding rectangle for the answersThis solution is not universal but works well in my cases and allow to fix the result by changing some parameter.