I am in the process of migrating an old website built in custom PHP into WordPress. The website has ~450,000 pages on it. I believe I have nearly 100% of the content migrated into WordPress.
However, for the sake of preserving SEO value, getting page load time to be reasonable, etc … I help with some NGINX configuration.
I can give SSH access via pem/ppk.
Configuration task:
Creating 301 redirects. The URL structure of the old site is massively incompatible with WordPress. The site has spaces all over the URLs (so ‘+’ when URL encoded). Additionally, there are tons of characters in the old URLs that WordPress will not support. I found all of these characters in various URLs: ß, é, è, ê, ˜e, ë, ñ, ž, ü, †, á, å, à, ä, â,ã, í, ï, î, ç, ç, š, ý, ú, ù, ø, õ, ð, ó, ö, œ, æ, —, £, €, ¥, ™, ©, º, ², ³, », ƒ, ¡, ¿, ‡, ¢, ¬, „, ‘, ¦, “, ”, ‚, ‰, ˆ
I changed the structure of the URLs during the migration to play nicely with WordPress permalinks. Now…I need 301 redirects so that old URLs will be 301-redirected to the appropriate page in WP.
I stripped the garbage out of the URLs in the WP conversion, but now I need NGINX to properly redirect to the appropriate URL. In order to do that, I think this requires writing a simple Perl subroutine to process incoming URLs, strip/substitute characters, and then 301 redirect to the appropriate URL. Or...if the appropriate WordPress-compliant URL is requested from the server, just have NGINX return it.
A young man from Ghana that I found on Fiverr looked at this. However, he failed to get it to work as needed. The Ghanaian young man has already recompiled NGINX to include [login to view URL] on my server. He wrote a Perl sub (which is not yet doing what I need it to do). Therefore, this may be very simple work for you to fulfill – or you may have to start from scratch with a Perl script to transform URLs properly.
In terms of the exact URL migration, I did string manipulation in MariaDB to strip the junk out of the URLs (there are around 450,000 URLs) and transform them to be WordPress-compliant. As such, performing the exact logical equivalent of these transformations should enable NGINX to serve the proper page.
The exact transformation I'm using in MariaDB is this: LOWER(REGEXP_REPLACE(REGEXP_REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(url, '$', 's'), '@', 'a'), '\t', 't'), '\r', 'r'), ' & ', ' and '), ' ', '-'), '[^a-zA-Z0-9\-]', ''), '[-]+', '-'))
I simply need the exact same thing done when a URL comes into NGINX so that the old URLs are mapped to the new ones.
Given the MySQL/MariaDB syntax above, I think the exact logic of the NGINX rewrite needs to be as follows (going from innermost function call to outermost function call):
- replace all '$' with 's'
- replace all '@' with 'a'
- replace '\t' with 't'
- replace '\r' with 'r'
- replace ' & ' with ' and '
- replace ' ' with '-'
- regex replace '[^a-zA-Z0-9\-]' with '-'
- regex replace '[-]+' with '-' (i.e., if there are multiple dashes in a row, replace all with a single dash)
- replace all capital letters with lowercase - return a 301 redirect
A Perl script should be able to do this URL manipulation for NGINX and then NGINX can return a 301 redirect the to resultant URL.