Case Study: Extract All Substack Article Titles and Links. Part C: Extract All
Handling Pagination.
3 min readDec 14, 2024
Non-Medium Members: Read this article free on Substack.
This article series:
- Part A: Extract Article Data
- Part B: Extract 25 articles on one page
- Part C: Extract All
- Part D: Publish
- Part E: Annotation by Zhimin Zhan*
(offering valuable tips for test automation engineers to level up their skills, exclusively available on Substack)
After Part B, I got all 25 article data from the first page in a proper CSV file.
Extract All 500+ Articles Out
Let’s focus on extracting the 2nd page’s articles first.
Clicking the “Next Page” button.
driver.action.scroll_by(0, 2500).perform # to the bottom
next_button_xpath = ".../button[2]" # hide xpath intentionally
next_page_btn = driver.find_element(:xpath, next_button_xpath)
next_page_btn.click
sleep 2