Implementing style sheets on multiple HTML files with a Perl script
In recent times, people have discovered Cascading Style Sheets as a means to control the look of a website from one page. CSS can also help cut down on the use of tables and clear up and cut down on confusing HTML code so as to make the HTML easier to maintain and more efficent in terms of file sizes. To do this, how
ever, each HTML page on the side needs to be linked to the style sheet.
Unfortunately, the most obvious way to do this is to open each HTML file, and to enter a line under the <HEAD> tag that reads: <link rel = "stylesheet" type="text/css" href = "stylesheet.css">. Thus, you may end up having to do this for ten, twenty, even a hundred HTML pages. There must be a better way.
There is a better way. To successfully carry out this plan, you need a Perl interpreter. You need copies of the web site’s files and folders on your computer. Finally, you need to compose and execute a simple Perl script on your interpreter. Through this, every file can be changed so as to include the line of code that links the style sheet to the HTML file. Now every file will be linked to the one style sheet, allowing the entire site to be changed instantly. Here, I will show you how to write that simple Perl script that will allow you to perform the miracle described above.
Why Perl?
A Perl is an interpreted language that is widely used in Common Gateway Interface (CGI) programming, used to create dynamically-generated web sites. Its major strength for this is its text-processing capability. This capability can be used for other purposes such as this one. HTML files are, after all, nothing more than ASCII text files. The markup allows it to be processed by the browser to produce web pages, but the underlying code is nothing but ASCII text.. Thus, we can process this text with a Perl script in order that we can enter the magic line, the text string <link rel = "stylesheet" type="text/css" href = "stylesheet.css">.
As we want to enter this line immediately following the first <HEAD> tag, and each HTML file contains but one <HEAD> tag, our plan is simple. To search every HTML file for the text string "<HEAD>" and to replace that text string with a new text string;"<head> <link rel = "stylesheet" type="text/css" href = "stylesheet.css">" Note that if your html files' head tag is marked <head>, to use that instead of <HEAD>.
The engine of change
One line of code in the Perl script replaces <HEAD> with <head> <link rel = “stylesheet” type = “text/css” href= “style/stylesheet.css”> The key line in the script is the simple regular expression that performs the substitution.
$line =~ s/<HEAD> /<head><link rel=\”stylesheet\” type=\”text/css\” href=\”style\/stylesheet.css\”/;
This regular expression means that we search for the text to be replaced; we look for it in the text string represented by the scalar variable $line. We give this command by our placing $line to the left of the =~ operator. Thus, in $line, we will substitute (represented by the s) any <HEAD> with the string of text with which we want to replace it. In this regex, the text to be replaced is immediately followed by the replacement text and these are separated with the “/” symbol. Escape anti-slashes “\” are used in both the text to be replaced and the replacement text as required. In this case, we see it where “<href=\”style\/stylesheet.css\”…
Where does this fit in?
You may wonder where this fits in. What is this scalar variable called $line? It’s a text string, of course, as we search for a pattern of text within it, the pattern that is the text string <HEAD>. How do we get this text string? We must start by opening one of the HTML files in order to get such a string. So, with this code, the file is opened.
unless (open(INPUT, "C:/file/file1.html")) {
die ("Cannot open file file1.html\n");
With this, we open the file and assign its contents to the “file handle” that I call “INPUT”. Unless it is opened, the “die” function is executed stopping the program and informing the user that it was unable to open file1.html.
Where is the $line?
$line does not yet exist. The file and its contents are represented by the file handle INPUT. Thus, we must retrieve all the contents of the file by assigning the contents of INPUT to an array of text strings that will represent the text. Perl arrays are known by the prefix “@”, as Perl scalar variables are known by the prefix “$”. $line is a scalar variable. This is the name that will be given to each element of our array @input.
@input = <INPUT>;
With this, we assign the contents of our file to the array @input. Each line in the HTML file becomes an individual element in the array. Our file is thus represented by @input and with our next line of code, each of the HTML file’s lines is represented by $line as we loop through the @input array. We loop through that array in our quest for the <HEAD> text to be replaced by our code that will link our HTML file to our style sheet.
foreach $line(@input) {
$line =~ s/<HEAD> /<head><link rel=\”stylesheet\” type=\”text/css\” href=\”style\/stylesheet.css\”/;
}
We have seen one of these lines before; the line that effects the substitution. A foreach statement loops through each $line in our array @input that contains the contents of our HTML file. This causes the program to search each line for the text string <HEAD> and when found it is replaced. The result is contained in the newly-processed array @input. Thus, @input now represents the contents of our transformed HTML document. The file itself has still not been changed, as this operation merely read from the file and did not write to it. The program has read from it, placed its contents in an array and changed the contents of the array by looping through each element of the array and subjecting it to a regular expression that performed the substitution.
We then close the file with the following;
close (INPUT);
So when is the file changed?
The next lines of code changes the file. This is done by reopening the same file, though in write-only mode. This erases the file contents. So at this point, the file is empty.
unless (open(OUTPUT, ">C:/file/file1.html")) {
die ("Cannot open file OUTPUT\n");
}
Notice the file has a new file handle; OUTPUT. The “>” before the file name tells the interpreter that this is a write-only file operation. Now we have an empty file, and we still have the array of text strings named @input, which contains the HTML file’s changed contents. Thus, our next move is to assign the contents of @input to our empty HTML file, which is represented by the file handle OUTPUT. After that, we can close the file.
print OUTPUT "@input";
close(OUTPUT);
The array of text strings called @input thus is printed to the file represented by the file handle OUTPUT.
Taken together, this is what our code looks like.
unless (open(INPUT, "C:/file/file1.html")) {
die ("Cannot open file file1.html");
@input = <INPUT>;
foreach $line(@input) {
$line =~ s/<HEAD> /<head><link rel=\”stylesheet\” type=\”text/css\” href=\”style\/stylesheet.css\”/;
}
close (INPUT);
unless (open(OUTPUT, ">C:/file/file.html")) {
die ("Cannot open file OUTPUT\n");
}
print OUTPUT "@input";
close(OUTPUT);
That only changes one file. How about the others?
The means by which the file is introduced to this Perl script is found here:
unless (open(INPUT, "C:/file/file1.html")) {
A careful inspection of the syntax reveals that “C:/file/file1.html” is a string of text. Such a string can be represented by a scalar variable. Better still, it can be represented by an element of an array of text strings, each of which represents an individual HTML file name.
We can describe an array as @array, or as $array[], with the square brackets representing indexed elements of the array; $array[0], $array[1], $array[2], etc. Thus, what we need is an array of text strings, each of which represents an individual file name. Let’s call this array @file. The first element of the array, $file[0], will be our “C:/file/file1.html”. Our next, $file[1] will be “C:/file/file2.html”, and so it continues through each file name; each file name as an element of our array @file. So, we code it as follows;
@file = (
"C:/file/file1.html",
"C:/file/file2.html",
"C:/file/file3.html",
"C:/file/file4.html”);
You may have noticed that we never describe any array element as $file[0] ; that must wait for the next bit of code. Here, we replace
unless (open(INPUT, "C:/file/file1.html")) {
die ("Cannot open file file1.html");
with
unless (open(INPUT, $file[0]{
die ("Cannot open file $file[0]");
The code, as it stands, still only processes one element in the array; $file[0], which refers to C:/file/file1.html. A loop statement is needed to process every element in the array, and through it, to process every file the name of which is entered as element in this array. This can be done with a for statement, using a counter that also is used to index the elements of the array.
for ($i=0; $i<@file; $i++)
{
unless (open(INPUT, "$file[$i]")) {
die ("Cannot open file $file[$i]\n");
}
In Perl, an expression such as @file can return an integer value equal to the number of elements in the array. Thus, the variable $i counts up to the number of elements in the array. In this way, each file is processed as intended.
The entire code
#!/usr/bin/perl
@file = ("C:/file/file1.html",
"C:/file/file2.html",
"C:/file/file3.html"
) ;
for ($i=0; $i<@file; $i++)
{
unless (open(INPUT, "$file[$i]")) {
die ("Cannot open file $file[$i]\n");
} #end unless statement
@input = <INPUT>;
foreach $line(@input) {
$line =~ s/.html /.html/;
} #end foreach loop
close (INPUT);
unless (open(OUTPUT, ">$file[$i]")) {
die ("Cannot open file OUTPUT\n");
}#end unless statement
print OUTPUT "@input";
close(OUTPUT);
} # end for loop