Case Study: Extract All Substack Article Titles and Links. Part D: Generate HTML and Publish
Generate HTML from extracted data and Invoke API to publish the content.
4 min readDec 21, 2024
Non-Medium members: read this article free on Substack.
This article series:
- Part A: Extract Article Data
- Part B: Extract 25 articles on one page
- Part C: Extract All
- Part D: Generate HTML and Publish
- Part E: Annotation by Zhimin *
(offering valuable tips for test automation engineers to level up their skills, exclusively available on Substack)
We now have over 500 articles data spread across 21 CSV files. We want to process all of them at once, so combine them into a single CSV file.
Aggregate CSVs
Run the following command (on Unix, macOS, or WSL on Windows) in the folder containing the generated CSV files (in a terminal).
% cat *.csv >> substack-published-articles-aggregated.csv
Create a Ruby script (shown below) to remove duplicate header rows, keeping only the first one.
found_head_row = false
lines = ["Title,Subtitle,Published On,Link"]…