An Unsuccessful Attempt to use OCR to Pass text-based Captcha in Selenium Automated Tests
Not recommended, but a fun exercise with using an OCR
First of all, Captchas are designed to stop automation. Ideally, Captchas would be disabled for automated testing, but this is not always possible (out of your control or due to human reasons).
Most Captchas these days are more advanced, they use images and very heavily distorted text. Even so, some sites still use more basic text captchas, like below.
Today, I came up with an idea to attempt OCR (Optical Character Recognition) in automated test scripts to parse the text-based captchas like the above. I give the result first: very low accuracy, i.e. no practical use. However, I think it is a good exercise.
Tesseract OCR library
Tesseract is a popular, free and open-source OCR library, and it runs on multiple platforms. Use the package manager to install it, for example, homebrew for macOS.
brew install tesseract
The usage tesseract <image_file> <output_file>
, example:
tesseract captcha1.png output
Output on screen like below:
Estimating resolution as 272
and a new file output.txt
was created, in the case, the content is
4g34
Advanced Tesseract
We can further enhance the recognition accuracy by tweaking some of Tesseract’s configurations, e.g. 1 word, 1 character or vertically aligned.
Here I will use the page segmentation mode. For instance, say that I know the Captcha is always 1 word (in our 3g34
example), I can specify this with the relevant page segmentation mode (8: Treat the image as a single word.
). Our new command looks like this:
tesseract captcha1.png output --psm 8
And when reading output.txt you will see the result is:
3g34
It is correct! For more information on the other modes and configurations, see the tesseract manual page.
Use Tesseract in Automated Tests
Test Design:
- Save the Image
- Run Tesseract to ‘read’ the captcha text
- Use it in the test scripts
Saving the Image
Firstly, we need to get the Captcha image for Tesseract to analyse. Selenium 4 allows you to take a screenshot of a web element.
tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)# save the captcha image, selenium 4 new feature
elem.save_screenshot(dest_image_file_path)
expect(File.exists?(dest_image_file_path)).to be true
Run Tesseract to ‘read’ the captcha text
Now we want to execute the command. In Ruby, we can simply use backticks (`) around it. Alternatively, you can look into the system
command.
captcha_value = `tesseract #{dest_image_file_path} output; cat output.txt`captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip
Now that we have the Captcha text that Tesseract ‘read’, we want to put it back into the page as our ‘guess’.
driver.find_element(:name, "checkcode").send_keys(captcha_value)
Demo
Below is a video of the test refreshing the Captcha and running Tesseract on it 5 times.
We can see that in the above video (in animated GIF), it got the correct Captcha once ( sCg5
) 🥳.
As our first example, Tesseract is not accurate, not suitable for automated testing at all. Having said that, it is interesting and I only played with Tesseract for 30 minutes. There might be tweaked for our image type.
Complete Code
load File.dirname(__FILE__) + "/../test_helper.rb"describe "Use Tesseract to get through Captcha" do
include TestHelperbefore(:all) do
# browser_type, browser_options, site_url are defined in test_helper.rb
@driver = $driver = Selenium::WebDriver.for(browser_type, browser_options)
driver.manage().window().resize_to(1280, 720)
driver.get(site_url)
visit("/member/login")
endafter(:all) do
driver.quit unless debugging?
endit "Tesseract Captcha" do
login_page = LoginPage.new(driver)
5.times do
elem = driver.find_element(:xpath, "//img[@title='refresh']")
elem.click
sleep 1tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)# save the captcha image, selenium 4 new feature
elem.save_screenshot(dest_image_file_path)
expect(File.exists?(dest_image_file_path)).to be truecaptcha_value = `tesseract #{dest_image_file_path} output; cat output.txt`
captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip
puts captcha_valuelogin_page.enter_captcha_code(captcha_value)
sleep 2
end
end
end
end