Removing Amazon links from WordPress

In my previous post, I said I was resolved to remove all Amazon Affiliate links from my WordPress blog.

I had no idea how how I was going to do that. I’d placed such links, in varying forms and blog incarnations, for a couple of decades. There were hundreds of links in dozens of posts. How could I hope to edit them all?

I figured it out. Whatever else you may say about me, I am a sysadmin.

(Note the lack of any adjectives like “good” in the previous paragraph. That judgement must be made by others.)

Anyway, here is my UNIX-command formula:

mysqldump <db-user> -u <db-name> wp_posts -p > wp_posts.sql
perl -pe 's|<a href=\\"https://amazon.com/.+?\\">(.+?)</a>|$1|g' wp_posts.sql \
   > wp_posts_pass1.sql
perl -pe 's|<a href=\\"https://www.amazon.com/.+?\\">(.+?)</a>|$1|g' wp_posts_pass1.sql \
   > wp_posts_pass2.sql
perl -pe 's|<a href=\\"https://amzn.to/.+?\\">(.+?)</a>|$1|g' wp_posts_pass2.sql \
   > wp_posts_pass3.sql
mysql -u <db-user> -p <db-name> < wp_posts_pass3.sql

This did not spring forth from my mind like Athena from Zeus. I already knew, from moving my WordPress sites to my private server, that the text of all my blog posts were in a mysql table named wp_posts. I still had to figure out:

  • how to dump just a single mysql table from a specific database (the easiest part of this process);
  • the approximate regular-expression pattern I’d need to remove the HTML link but the leave the text it enclosed unchanged;
  • that sed would not support non-greedy regex qualifiers;
  • how perl regex differs from sed regex;
  • how to load the revised table back into mysql without quivering in fear;
  • that it’s spelled “amazon” and not “amazom”

I don’t think I could have done this with tools like cPanel and phpMyAdmin. I may be old-fashioned (I’m still crotchety when it comes to Python), but the UNIX command line was definitely the way to go.

If you’re copying the formula, note that it’s not suitable for a production environment. Since I’m in control of my site, I knew there would be no transactions to the mysql database as I did this work. If there was a chance of transactions mid-update, I would have had to think about using mysql’s REGEX_REPLACE and database/table locking.

I don’t pretend that I’ve gotten to every single affiliate link in this blog. It’s possible that there are some that are not of form https://amazon.com/, https://www.amazon.com/, or https://amzn.to/.

But it’s good enough to salve my troubled conscience.

Postcript:

I know there are WordPress plug-ins that purport to do this kind of thing. I even installed Better Search Replace. I tested it by updating all the links from argothald.wordpress.com (the link to an older version of my blog) to argothald.com.

But though that plug-in will let you do regex search-and-replace, you have to pay for the Pro version for that feature. Even then, I felt far more comfortable testing my regex expressions on the command line than risk making a mistake in some plug-in’s text input field.

This Post Has One Comment

  1. William Seligman

    Post-postscript:

    I discovered that there were a hundred or so very old links of the form http://www.amazon.com (that is, http instead of https) lurking in old blog posts. Fortunately, the above formula made short work of them.

Leave a Reply