Raspberry Pi step-by-step instructions for adding Automatic Canonical Header Generation for PDF, TXT and/or other file types with an Apache 2 Web Server.
These procedures apply to Raspberry Pi 5, 4 or 3 with Raspberry Pi OS (64-Bit), (32-Bit) or (Legacy, 32-Bit) running Apache 2 with or without Let's Encrypt Certificates and Certbot.
General Notes
1. General:
The procedures below are optimized for adding Automatic Canonical Header Generation for PDF, TXT and/or other file types to a Raspberry Pi 5, 4 or 3 with Raspberry Pi OS (64-Bit), (32-Bit) or (Legacy, 32-Bit) running an Apache 2 Web Server with or without HTTPS using Let's Encrypt Certificates and Certbot.
2. Automatic Canonical Header Generation for PDF, TXT and/or Other File Types:
Canonical information is used by search engines when building indexes while deciding which pages to include in search results.
Unlike HTML and similar file types, PDF, TXT and other file types do not have provisions for delivering canonical information.
The steps below are for implementing automatic canonical header generation for PDF, TXT and/or other file types being served by the Web Server.
Alternatively, Web Servers can be configured on a file by file file basis to deliver canonical file information in headers, but is not discussed further here.
3. Internet access during setup:
Many of the steps below assume and require the target Raspberry Pi is connected to a network with access to the Internet.
Notice about updates, upgrades and installations failing due to repository or network congestion or outages
Occasionally updates, upgrades and installations fail due to repository or network congestion or outages.
Sometimes there is an appropriate message saying as such, sometimes a missing file is reported, and sometimes there is just a failure message without an explanation.
When this occurs, simply run the command again.
If that does not solve the issues immediately, try again later.
Automatic Canonical Header Generation for PDF, TXT and/or other file types
Notes:
See "General Notes" 2. near the top of this document.
The procedure below is for implementing automatic canonical header generation for PDF, TXT and/or other files types being served by HTTPS.
The procedure can be easily modified for any desired file types to have automatic canonical header generation and for automatic canonical header generation occurring for HTTPS and/or HTTP.
To configure automatic canonical header generation for files being served by HTTPS, Let's Encrypt Certificates and Certbot must already be setup and configured.
Automatic canonical header generation for files being served by HTTP can be configured with or without Let's Encrypt Certificates and Certbot being setup and/or configured.
When Let's Encrypt Certificates and Certbot are setup and configured for a website, HTTP traffic is automatically redirected (301) to HTTPS for that website by default, therefore only the HTTPS portion of the automatic canonical header generation requires configuration for that website unless the automatic redirection is manually disabled (removed from or commented out in the website's HTTP Virtual Host).
If Let's Encrypt Certificates and Certbot are not setup and configured for for a website, then the only the HTTP portion of the automatic canonical header generation requires configuration for that website.
Update Raspberry Pi OS and Components
Download the latest package lists
sudo apt update -y
Download and install the updated packages listed in the package lists
sudo apt full-upgrade -y
Enable the Headers Module mod_headers
sudo a2enmod headers
Enable the Rewrite Module mod_rewrite
sudo a2enmod rewrite
Reload the Apache 2 Web Server
sudo systemctl reload apache2
Reconfigure the HTTPS Virtual Hosts (Web Servers) ONLY when Let's Encrypt Certificates and Certbot are setup and configured for websites
Apache 2 supports one or more Virtual Hosts on a single machine. In the examples below, two (2) Virtual Hosts are being reconfigured: exampledomain1.com exampledomain2.com
Note:
In the examples below:
Replace exampledomain1.com and exampledomain2.com with your URLs.
Adjust (pdf|txt) as apropriate. Examples: (pdf) or (pdf|txt|xlsx).
Disable the HTTPS Virtual Hosts to be reconfigured
Add these seven lines just above </VirtualHost> near the bottom of the file: # Automatically set canonical headers for all PDF and TXT files # Enable rewriting RewriteEngine On # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash) RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC] # Add canonical header using mod_headers Header set Link "<https://www.exampledomain1.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH
Save and close mousepad
File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
- or -
Save and close nano
Press CTRL + X and then press y and ENTER to save changes
Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)
sudo mousepad /etc/apache2/sites-available/exampledomain2.com-le-ssl.conf
- or -
Add these seven lines just above </VirtualHost> near the bottom of the file: # Automatically set canonical headers for all PDF and TXT files # Enable rewriting RewriteEngine On # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash) RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC] # Add canonical header using mod_headers Header set Link "<https://www.exampledomain2.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH
Save and close mousepad
File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
- or -
Save and close nano
Press CTRL + X and then press y and ENTER to save changes
Test the HTTPS configurations (Optional)
sudo apachectl configtest
You should now see:
Syntax OK
If a site needs to be edited again, disable the site before editing it using the sudo a2dissite command with the syntax noted above.
After editing the site, save the changes and enable the site again using the sudo a2ensite command with the syntax noted above, then reload Apache using the sudo systemctl reload apache2 command for it to get and begin using the new configuration.
Test HTTPS Automatic Canonical Header Generation (Optional)
curl -I https://exampledomain1.com/<ApplicableFileOnTheWebServer> - Example: curl -I https://exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I https://www.exampledomain1.com/<ApplicableFileOnTheWebServer> - Example: curl -I https://www.exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I https://exampledomain2.com/<ApplicableFileOnTheWebServer> - Example: curl -I https://exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I https://www.exampledomain2.com/<ApplicableFileOnTheWebServer> - Example: curl -I https://www.exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <https://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
Reconfigure the HTTP Virtual Hosts (Web Servers) ONLY when HTTP to HTTPS redirection has been disabled or Let's Encrypt Certificates and Certbot are NOT setup and configured for websites
Apache 2 supports one or more Virtual Hosts on a single machine. In the examples below, two (2) Virtual Hosts are being reconfigured: exampledomain1.com exampledomain2.com
Note:
In the examples below:
Replace exampledomain1.com and exampledomain2.com with your URLs.
Adjust (pdf|txt) as apropriate. Examples: (pdf) or (pdf|txt|xlsx).
Add these seven lines just above </VirtualHost> near the bottom of the file: # Automatically set canonical headers for all PDF and TXT files # Enable rewriting RewriteEngine On # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash) RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC] # Add canonical header using mod_headers Header set Link "<http://www.exampledomain1.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH
Save and close mousepad
File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
- or -
Save and close nano
Press CTRL + X and then press y and ENTER to save changes
Launch Mousepad from Terminal in the Raspberry Pi GUI (Desktop)
sudo mousepad /etc/apache2/sites-available/exampledomain2.com.conf
- or -
Add these seven lines just above </VirtualHost> near the bottom of the file: # Automatically set canonical headers for all PDF and TXTfiles # Enable rewriting RewriteEngine On # Capture canonical path for PDF and TXT files (case-insensitive, strip leading slash) RewriteRule ^/?(.+\.(pdf|txt))$ - [E=CANONICAL_PATH:$1,NC] # Add canonical header using mod_headers Header set Link "<http://www.exampledomain2.com/%{CANONICAL_PATH}e>; rel=\"canonical\"" env=CANONICAL_PATH
Save and close mousepad
File| Save [Ctrl+S] and File | Quit [Ctrl+Q] or X out
- or -
Save and close nano
Press CTRL + X and then press y and ENTER to save changes
Test the HTTP configurations (Optional)
sudo apachectl configtest
You should now see:
Syntax OK
If a site needs to be edited again, disable the site before editing it using the sudo a2dissite command with the syntax noted above.
After editing the site, save the changes and enable the site again using the sudo a2ensite command with the syntax noted above, then reload Apache using the sudo systemctl reload apache2 command for it to get and begin using the new configuration.
Test HTTP Automatic Canonical Header Generation (Optional)
curl -I http://exampledomain1.com/<ApplicableFileOnTheWebServer> - Example: curl -I http://exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I http://www.exampledomain1.com/<ApplicableFileOnTheWebServer> - Example: curl -I http://www.exampledomain1.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain1.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I http://exampledomain2.com/<ApplicableFileOnTheWebServer> - Example: curl -I http://exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
curl -I http://www.exampledomain2.com/<ApplicableFileOnTheWebServer> - Example: curl -I http://www.exampledomain2.com/somefile.pdf
You should see something like:
HTTP/1.1 200 OK
Link: <http://www.exampledomain2.com/somefile.pdf>; rel="canonical"
If you see the Link: header, your automation is working.
Remove packages that were automatically installed and are no longer required
Occasionally excess update, upgrade and installation packages install automatically, but are no longer required. These can be removed automatically.
Automatically detect and remove packages no longer required