Automated Testing PDF in Selenium WebDriver

Many websites feature links that download a PDF rather than just opening one. These PDF files’ content might be static (e.g. a restaurant menu or a booklet), or dynamically generated (e.g. a bank statement or a student’s grade report).

This tutorial will show you how to allow PDF downloads and verify the PDF contents in an automated test.

Test Design

  1. Navigate to a web page and download the PDF
    For my example, I’m downloading a book sample PDF at http://zhimin.com/books/pwta. When you download it, you need to configure where the file is downloading to on your machine.
  2. Verify the downloaded PDF exists
    Once the file is downloaded from the browser, check if its downloaded successfully on your machine.
  3. Read and verify the PDF’s contents
    This step is not only for dynamically generated PDFs. It’s good practice to verify your PDF’s contents, even if it’s a static PDF.

Saving the download file to a specific location

First, let’s make sure we save the PDF to an area safe for testing. To test safely and avoid conflicts, download your PDF file to a test folder, and delete the PDF file after the test executes.

PDF verification library

Then use the ‘pdf-reader’ gem to verify the PDF. Install it on your command line with:

gem install pdf-reader

Open browser with specified download folder

To set a download location in Selenium WebDriver, you must update your browser options.

before(:all) do
# set up download settings
@download_path = "/Users/courtney/tmp"
options = Selenium::WebDriver::Chrome::Options.new
options.add_preference("download.prompt_for_download", false)
options.add_preference("download.default_directory", @download_path)

@driver = $driver = Selenium::WebDriver.for(:chrome, :options => options)
driver.get(site_url)
end

Setting prompt_for_download to false means that you won’t receive a pop-up asking what to name the file and where to save it to. Instead, it will save it under the default name in the default_directory. Now run the sample script and check if the PDF file is in the download directory.

Download and Verify the downloaded file

To download the PDF, click the webapp’s download link. In your test, ensure you have a sleep immediately after to allow time for the file download to complete.

driver.find_element(:link_text, "Download").click
sleep 10

Given a sample script that downloads the PDF, we want to verify if the PDF is there. This can be easily done by using Ruby’s helpful File.exists?(file_path) function.

expect(File.exists?("#{@download_path}/sample.pdf")).to be_truthy

Note that expect(...).to be_truthy is equivalent to expect(...).to eq(true). However, I find be_truthy to be more readable than eq(true).

If I run my current script, then success! My test on the sample website is passed.

it "Download PWTA sample" do
visit("/books/pwta")
driver.find_element(:link_text, "Download").click
sleep 10

saved_file = "#{@download_path}/practical-web-test-automation-sample.pdf"
expect(File.exists?(saved_file)).to be_truthy
end

Verify the PDF’s contents

Our PDF exists, but we aren’t entirely done yet. How can we be sure the PDF we downloaded is valid (openable) and correct (contents-wise)?

Here I will use the PDF reader gem, pdf-reader, to get the text contents and verify the PDF.

First, let’s load the PDF file.

reader = PDF::Reader.new(saved_file)

Verify PDF page count

To verify if we can open the PDF, we can use pdf-reader’s page_count.

expect(reader.page_count > 0).to be_truthy # verify PDF can be opened
expect(reader.page_count).to eq(62)

Verify PDF contents

Now we can verify the PDF’s contents. Note that pdf-reader works by treating each page separately. This means that you will need to loop through all the pages to read the whole PDF or use indexing to go to a particular page.

For my sample PDF, the first page is the cover image. There’s no text here, so I will verify text on the second page instead.

second_page_text = reader.pages[1].text
expect(second_page_text).to include("Test web applications wisely with Selenium WebDriver")

With this, I have completed verifying my PDF’s contents. pdf-reader has other methods of reading apart from just ‘text’. It can also handle PDF metadata, page orientation and raw-content streams, which may be helpful in assertions.

Completed Test Script

load File.dirname(__FILE__) + "/../test_helper.rb"require "pdf-reader"describe "PDF Download and Verification" do
include TestHelper
before(:all) do
@download_path = "/Users/courtney/tmp"
options = Selenium::WebDriver::Chrome::Options.new
options.add_preference("download.prompt_for_download", false)
options.add_preference("download.default_directory", @download_path)
@driver = Selenium::WebDriver.for(:chrome, :capabilities => options)
driver.get(site_url)
end
after(:all) do
driver.quit unless debugging?
end
it "Download PWTA sample" do
visit("/books/pwta")
saved_file = "#{@download_path}/practical-web-test-automation-sample.pdf"
FileUtils.rm(saved_file) if File.exists?(saved_file)
driver.find_element(:link_text, "Download").click
sleep 10
expect(File.exists?(saved_file)).to be_truthy
reader = PDF::Reader.new(saved_file)
puts reader.info
expect(reader.page_count).to eq(62)
second_page_text = reader.pages[1].text
puts second_page_text
expect(second_page_text).to include("Test web applications wisely with Selenium WebDriver")
end
end

Notes

  • Use relative paths for the download directory. I used an absolute one in this tutorial, but if your test is run on a different machine/shared, it could cause problems.
  • I suggest attempting to removing the file before the download begins. This can avoid any conflicts with past versions from previous test runs. I did this here in FileUtils.rm(saved_file) if File.exists?(saved_file)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store