fyvie.net home
home | photos | diving | resources | projects | E+ | info | search

htmlcutter

Stanley knifeSlice up text files

Before I wrote the rather large documentation set for ExhibitPlus, I was sitting around thinking about how someone who likes to author their site with just a plain text editor can cope when their site gets larger, and they have multiple menus and many files in different directories. To manually edit and upload files across all these different directories would become a difficult task indeed!

The requirement

My requirement was to have a site with maybe 30 or so html pages. Having them share a single style sheet was no problem, but how could I get them to share a common set of menu options and a common footer? If I were to include the same menu and footer at the top of each page, what would happen if I wanted to add a new menu item? I'd have to either manually edit every file, or write a script to do a series of complex search and replace operations on all the files. Then we had the matter of validating the files for compliance with web standards - would I have to individually verify each file? Surely there had to be an easier way!

If I were to write the whole web site in a single text file, then I could validate every single page simply by validating a single large source file. It would enable me to spell check it, print it, or upload it as a single document, All that should happen then is that the header and footer from this file should be cut out and appended to the beginning and end of all the various output files.

I figured that Perl would be a good language to use. Not because I love Perl, or because I wanted something that would easily run both on my Windows and Linux boxes, but because some years ago I forced myself to sit through O'Reilly's Learning Perl with the notion that one day I might actually use those skills. Admittedly, I'd remembered very little of the book, but I was able to pick it up and hack together some very sloppy code which roughly did what I wanted.

Before writing htmlcutter I did spend a long time trawling the net through the endless number of scripting sites, but found that nobody else was even remotely attempting to do what I was. Was I just crazy, or was the world just busy cranking out horrendous HTML with their web site authoring packages? For this reason I didn't even mention that I'd written htmlcutter, figuring that it was probably a huge waste of time anyway. But the other day, shortly before I sat down to start writing this drivel, I figured "what the hell" 'why shouldn't I put it online, and maybe, just maybe someone, somewhere might get some use out of it. So if this someone is you - please drop me a line and let me know that you find it useful - I'd get a kick out of knowing I'm not crazy after all...

What does it do?

So you just skipped my explanation above and started reading from this point didn't you? Ok, I'll give you the quick summary then. htmlcutter will take a single .html file and cut it up into a series of smaller files. It does this by looking for special commands, which appear like HTML comments, within the file. These commands tell htmlcutter at which points it should "cut" the code into a new file. Perhaps at this point we should explain by way of example.

Consider that we have a theoretical file called index-source.html:

<html>
<head>
  <title>Boring web site...</title>
  <link href="style.css" rel="stylesheet" type="text/css" />
</head>
<body>
  <div id="menu">
    <a href="index.html">Home</a>
    <a href="page1.html">Page 1</a>
    <a href="page2.html">Page 2</a>
  </div>

  <div id="main">
<!-- [HTMLCUTTER] filename=index.html -->
    <p>Hi everyone, welcome to the web site, use the links above to navigate.</p>
    <p>That is all, move along please, nothing more to see.</p>
<!-- [HTMLCUTTER] filename=page1.html -->
    <p>Welcome to page 1 - didn't we tell you there was nothing to see?</p>
<!-- [HTMLCUTTER] filename=page2.html -->
    <p>Welcome to page 2 - you still don't believe us?</p>
<!-- [HTMLCUTTER] FOOTER -->
  </div>

  <div id="footer">
    <p>All rights reserved, come again soon</p>
  </div>
</body>
</html>

htmlcutter will now cut out file into 3 files, index.html, page1.html and page2.html. It will assume that everything above the first [HTMLCUTTER] filename= command is the header and will start each output file with this header. It will also assume that everything below the [HTMLCUTTER] FOOTER command is a footer, and will append it to the end of each output file.

Pretty simple? Of course it is, I didn't have time to write something decent, that's why this program is so basic. So to cut a long story short we now have some output files which look like this:

index.html

<html>
<head>
  <title>Boring web site...</title>
  <link href="style.css" rel="style sheet" type="text/css" />
</head>
<body>
  <div id="menu">
    <a href="index.html">Home</a>
    <a href="page1.html">Page 1</a>
    <a href="page2.html">Page 2</a>
  </div>

  <div id="main">
    <p>Hi everyone, welcome to the web site, use the links above to navigate.</p>
    <p>That is all, move along please, nothing more to see.</p>
  </div>

  <div id="footer">
    <p>All rights reserved, come again soon</p>
  </div>
</body>
</html>

page1.html

<html>
<head>
  <title>Boring web site...</title>
  <link href="style.css" rel="style sheet" type="text/css" />
</head>
<body>
  <div id="menu">
    <a href="index.html">Home</a>
    <a href="page1.html">Page 1</a>
    <a href="page2.html">Page 2</a>
  </div>

  <div id="main">
    <p>Welcome to page 1 - didn't we tell you there was nothing to see?</p>
  </div>

  <div id="footer">
    <p>All rights reserved, come again soon</p>
  </div>
</body>
</html>

page2.html

<html>
<head>
  <title>Boring web site...</title>
  <link href="style.css" rel="stylesheet" type="text/css" />
</head>
<body>
  <div id="menu">
    <a href="index.html">Home</a>
    <a href="page1.html">Page 1</a>
    <a href="page2.html">Page 2</a>
  </div>

  <div id="main">
    <p>Welcome to page 2 - you still don't believe us?</p>
  </div>

  <div id="footer">
    <p>All rights reserved, come again soon</p>
  </div>
</body>
</html>

Notice that we were careful to make sure that the <div id="main"> line came as the last line of the header, and the closing </div> was the first line of the footer? This is so that this block was wrapped up nicely in each file. Of course you may not wish to have a block defined like this, and since you are following this far you obviously know a little bit about HTML in which case we aren't going to presume to tell you how to write your pages here...

So now you see the basic function of htmlcutter. The next step is to write your input file, which you can call whatever you like. In our next example we'll use index-source.html as our input file. Simply run htmlcutter like this:

[root fyvie.net]# ./htmlcutter.pl index-source.html

If everything is ok, you'll see no output from the program and when you look at your directory you'll see the newly created output files. Of course the example above was from Linux, so in Windows you aren't going to put a ./ in front of the command, and if you don't have Perl in your path, then you'll have to write the full path to the Perl executable first.

Documentation?

I put the documentation into the file itself, so this way if I add features I only have to document them there, and nowhere else. I add features from time to time whenever I need something new that htmlcutter can't do. So the short answer on where to find the documentation is to use the --help command line parameter like this:

[root fyvie.net]# ./htmlcutter.pl --help

You'll find all the various other features described in there. A brief summary of some of these features:

Download htmlcutter

You can download htmlcutter here.

You may also need to modify the first line of the file if your Perl executable is somewhere else. But you knew that already.

Change history

Version 1.10 (10.10.2005) - Initial public release.

Version 1.11 (11.03.2006) - Added ability to use meta description.

Current version is 1.11

Perl for Windows users

If you are a Windows user and are scratching your head wondering what all this Perl business is about - don't worry. It's very easy. All you need to do is download ActivePerl from ActiveState.It takes only a few minutes to install. If you are running another operating system and want to get Perl, please search for it in Google.

Comments and suggestions?

Since this is a tool that I wrote for myself I'm curious to know if you find it useful, if you have any suggestions, or if you've made your own enhancements to it. Please let me know.

valid XHTML 1.0! valid CSS!