Friday, August 29, 2014

TOOLS FROM MY CHEST

from The Last Supplement to the Whole
Earth Catalog (1971); co-editor Paul Krassner's
response to Ken Kesey's 17-page screed
entitled "Tools From My Chest"

( www.ep.tc/realist/89 )

    "As the UNIX system has spread, the fraction of its users who are skilled in its application has decreased."

      — Kernighan & Pike, "The UNIX Programming Environment" (1983)

I'm a software guy, so my tools are software tools. Over the years I've picked up tools that have stayed with me; some I found, and adapted to my uses, while some I created myself. This blog entry is about some of the found tools — as well as tools for building tools (meta-tools?) — which have become my favorites.

The Search for Portability

Keeping my favorite tools around has been no easy feat, and frankly, sometimes I'm astonished that I was able to do it at all. You see, throughout my computer career (which has spanned about 60% of the history of commercially available computers), there has been an incessant struggle by vendors to "lock in" customers to proprietary systems and tools. This tended to work for them in the short term, but in the long tertm they almost all failed, and left orphaned software at every turn when the proprietary systems became extinct. The only force working counter to this was the occasional heroic struggles of the users, first by banding together to demand standards, and then later by writing and giving away free standards-based software. These brave efforts have kept the ovderall trend towards extinction of tools partially at bay.

At first I didn't understand how important this struggle was. I began programming in the era of proprietary operating systems such as: OS on IBM 360 mainframes, RSTS on DEC PDP-11 minicomputers, AOS on Data General minicomputers, plus the toy Apple DOS disk operating system on the Apple II personal computer, and the equally toy DOS from Microsoft which ran on PCs. I wrote several productivity tools during each of these technical epochs that were all lost to the sands of time.

This began to bother me. I was pretty sure my dad had tools in his physical workshop dating from the 1940s, such as screwdrivers, chisels and alligator clips, that all still worked fine. Why couldn't I keep the tools I built myself from one decade to the next?

When I first used UNIX in 1983 I was elated to find an OS that was somewhat standard, and ran on multiple hardware platforms. Soon I began to appreciate its robust set of small tools that can be recombined to quickly solve problems, and its tendency to hide hardware details from programmers, and eventually I also greatly appreciated its universal presence on nearly every hardware platform. But, honestly, the big deal at the beginning was that so many people I though were smart were gung ho for UNIX.

A Voice in the Wilderness


K&P

I think I first hear about UNIX from some of my smart, geeky friends in the late 1970s, but I do believe the first hard information I got about it came from a book I read around 1980, at the recommendation of my friend Wayne. It was "Software Tools" (1976) by Brian W. Kernighan and P. J. Plauger.

The funny thing was it hardly mentioned UNIX at all, but its co-author Brian Kernighan was tightly wound into the social network at Bell Labs which produced UNIX.

I have been re-reading this book this month, and I am amazed to find how much of its teachings have been deeply integrating into my thinking about software. It mentions UNIX in only two places I could locate, as an example of a well-designed operating system, while explaining how to work around the limitations of poorly-designed systems. This understated praise made UNIX seem uber-cool.

When I'm reading a book I own I usually note interesting quotations with notes in the inside front cover, with a page number and some identifying words. In a typical book I like I'll list a handful of quotes. In this one there are over thirty. It's hard to pick just a few, but here are some I really like.

Whenever possible we will build more complicated programs up from the simpler; whenever possible we will avoid building at all, by finding new uses for existing tools, singly or in combination. Our programs work together; their cumulative effect is much greater than you could get from a similar collection of programs that you couldn't easily connect.

* * * * * *

What sorts of tools [should we build]? Computing is a broad field, and we can't begin to cover every kind of application. Instead we have concentrated on an activity which is central to programming — tools that help us develop other programs. They are programs that we use regularly, most of them every day; we used essentially all of them while we were writing this book.

* * * * * *

How should we test [a program to count the lines in a file] to make sure it really works? When bugs occur, they usually arise at the "boundaries" or extremes of program operation. These are the "exceptions that prove the rule." Here the boundaries are fairly simple: a file with no lines and a file with one line. If the code handles these cases correctly, and and the general case of a file with more than one line, it is probable that it will handle all inputs correctly.

In addition to these general principles, I learned to appreciate any operating system that can string programs together, such as with the "pipes" mechanism of UNIX.

So by the time I actually got to use UNIX, I already knew a little about what made it so awesome.

Joining the UNIX Tribe


UNIX license plate ( www.unix.org/license-plate.html )

Before the World Wide Web (WWW), it was hard to learn about computers. It took a rare genius to be able to simply read the manuals and figure out computer hardware and software. Most of the time a tribe would form around an installation, with oral traditions and hands-on demonstrations being passed on to those who were less adept at learning from the manuals. In many cases the field technical staff for the computer vendor would come in to do demos and training, and pass along the oral tradition that way, and so there didn't have to be a "seed" person who could figure it out by reading alone. I remember asking the the hypothetical question if it would be possible for someone in an igloo with a generator to figure out a computer all alone.

I was fortunate when I was learning to use UNIX to have just started a new job where I had access to three very knowledgable and clever people named Dan, Phil and Jeff.

Dan was a full-time employee of the company, and person who helped me get the job. He had been adamant that the company get UNIX, accepting no other option. (He had a "Live Free Or Die — UNIX" license plate similar to the one shown above, a variation of the New Hampshire state motto, on display in his office.) They ended up buying a VAX minicomputer from DEC, but getting the operating system from a third-party vendor, Uniq, which sold them a version of Berkeley Standard Distribution (BSD) Unix. It was on this system that I began to learn.

Dan gave me the most hands-on help, setting me up with a login, teaching me about the tricky but vital $PATH and $TERM variables and the squirrely nature of backspaces, teaching me how to customize my prompt and giving me templates for shell aliases and C programming language source code (more on those in a minute).

He also taught me something I've never seen written down anywhere: for about 50% of the common UNIX commands (I counted), the two-letter abbreviation is formed by the first letter followed by the next consonant. So:

  • archive becomes ar
  • copy becomes cp
  • c compiler becomes cc
  • link becomes ln
  • list becomes ls
  • move becomes mv
  • names becomes nm
  • remove becomes rm
  • translate becomes tr
  • visual editor becomes vi
&etc. Dan also explained to me that UNIX began in the minicomputer era when most people used teletypes, which printed on paper, to communicate with timesharing systems, and each keystroke took time and energy, so the commands were made as short as possible. (We used to have a joke that you were born with only so many key presses to use throughout your life, so you had to conserve them.)

A similar impulse encouraged the liberal use of aliases, which allowed a long command to represented by a short abbreviation, such as 'd' in place of 'ls -CF' (list files in columnar form, formatted). Dan gave me my first set of aliases; more on that below.

Through Dan's social network of UNIX experts the company found Phil and Jeff, who worked as consultants. I didn't see them often — they both frequently worked remotely from nearby UCSD — but they were also very helpful.

Phil helped me understand the history and philosophy of UNIX. He told me how most hardware manufacturers would rush their software people to finish an operating system as soon as possible, so they could start shipping hardware for revenue. At Bell Labs, UNIX was developed slowly and lovingly over a period of about a dozen years. There was no time pressure because AT&T (also known as "the phone company") was still a government-granted monopoly for voice traffic, and to keep them from having an unfair advantage in other communications they were prohibited from selling computer hardware or software. UNIX was originally for internal use only at Bell Labs.

He also explained that every program was designed to do one thing well, and be interconnected with others. One important principle was that any program shouldn't know — or care — whether a human or another program was providing its input. This is why programmers were encouraged to have output be terse, or sometimes provide none at all. The standard UNIX report from a successful operation is no output. For example, the program diff lists the differences between two files. No differences results in no output. In the same vein, on most traditional mainframe computers if you connected to the command console and you press the Enter key without typing a command, you get an error something like "ERROR #200232 -- ZERO LENGTH COMMAND" and then another prompt. In UNIX you just get the prompt. The assumption is you know what you're doing. Maybe you just wanted to see if the system is "up," or how responsive it is (a relic of the timesharing era which is also useful when you run large simultaneous jobs on a PC).

Jeff was the most terse. He was very busy writing original software of the "kernel" of an embedded system, which is very gnarly work, and usually had a lot on his mind. I'd ask him, "How do you do X?" and he'd usually say "I am not a manual." Once I said, "But there's six feet of manuals on the shelf; I don't have to to read them all." "Use man," he replied. "What's that?" I asked. "man man" he said cryptically. But when I typed "man man" into a command prompt, I got something like this:


man(1)                                                                  man(1)

NAME
       man - format and display the on-line manual pages

SYNOPSIS
       man  [-acdfFhkKtwW]  [--path]  [-m system] [-p string] [-C config_file]
       [-M pathlist] [-P pager] [-B browser] [-H htmlpager] [-S  section_list]
       [section] name ...

DESCRIPTION
       man formats and displays the on-line manual pages.  If you specify sec-
       tion, man only looks in that section of the manual.  name  is  normally
       the  name of the manual page, which is typically the name of a command,
       function, or file.

......

Jeff taught me to be self-sufficient. He would help me if I already tried to help myself, and hit a blockade. When I told him I couldn't find a "search" or "find" feature, he said "man grep" and that did the trick.`

Nuts and Bolts

There are some specific tools that I learned 30 years ago which I still use frequently today as I earn my daily bread. They include:

  • the vi (visual) editor

    ( en.wikipedia.org/wiki/Vi )

    The creation myth is that nearly every minicomputer with a teletype interface had a "line editor," usually called ed and in the UNIX world t evolved into ex, the "extended editor." When Cathode Ray Tube Terminals (CRTs) became cheap enough for widespread use, it further evolved into vi, the "visual editor."

    Much to my astonishment I have ben able to use vi, with nearly exactly the same features, on every computer I've owned for the last two decades.

    It's also ben part of the evolution of other UNIX tools. As the Wikipedia article on ed explains: "Aspects of ed went on to influence ex, which in turn spawned vi. The non-interactive Unix command grep was inspired by a common special uses of qed and later ed, where the command g/re/p means globally search for the regular expression re and print the lines containing it. The Unix stream editor, sed implemented many of the scripting features of qed that were not supported by ed on Unix. In turn sed influenced the design of the programming language AWK — which inspired aspects of Perl."

    For more details see "A History of UNIX before Berkeley: UNIX Evolution: 1975-1984."

    ( www.darwinsys.com/history/hist.html )

  • the C programming language

    ( en.wikipedia.org/wiki/C_language )

    The other main thing these three guys got me started with was the C programming language. C is infamous for being a language that many consider too primitive, or too close to "the metal," and of course it predates the "object oriented" revolution, but it is just about perfect for implementing UNIX. For this reason and others it probably will last a very long time.

    Dan gave me a C code template which I use to this day. It could use updating, but what the heck, the computer doesn't care, and it still works.


    IQ Tester ( www.lehmans.com/p-4542-original-iq-tester.aspx )

    This program I wrote to solve an "IQ Tester" puzzle in 2007 follows a template that evolved out of the template Dan gave me in 1983. (In other words it has improvements I've made along the way). And it runs fine on Windows, Mac and UNIX/Linux!

    The definitive book on C is by the ubiquitous Brian W. Kernighan, coauthoring with Dennis Ritchie who actually invented C: "The C Programming Language" (1978).

  • the C shell (csh) interface and scripting language


    nautilus icon for csh in old SunOS UNIX
    ( toastytech.com/guis/sv4112.html )

    ( en.wikipedia.org/wiki/C_shell )

    Phil explained to me that the "kernel" of UNIX had an application program interface (API) so application programs could "call" into operating system. Then there were "layers" that could "wrap around" the kernel, each adding a new "shell" which was a higher- level interface. The symbol was a nautilus shell with its logarithmic chambers. The term came to mean a text-based scripting interface to the operating system. The first was Bourne's original shell, which we now call the "Bourne Shell" be he called sh in fine UNIX naming tradition.

    It was the famous Berkley port (and re-invention) that introduced the "C Shell," csh, a fine pun in the UNIX humor tradition. This was the first shell I learned, and I stick with it if I can. These days I'm often forced to use bash, the "Bourne-Again Shell" which is based on the Bourne Shell and runs easily on Windows and Mac.

    I'm writing this using vi, and I can "bang-bang out" to a command for my local bash (precede it by two exclamation points). Here, I'll make a deliberate error:

        !!foo
    
        /bin/bash: foo: command not found
    

    That was my local bash responding, right into my text document. It seems like I can't live without these things.

    Perhaps the most mind-blowing thing Dan did for me was to teach me the alias mechanism in the C Shell. By way of example, I can type the echo command with arguments:

        echo hello there
    
    and I will get back the "hello there" from my local C Shell. Or I can type:
        alias ht echo hello there
    
    and it will appear that nothing has happened, but henceforth when I type "ht" to my shell I will get "hello there" back. I have created the alias ht and it's like my own personal custom UNIX command.

    Next, Dan typed this command for me to ponder:

        alias a alias
    
    What do you think it does?

  • shell tools (UNIX utilities)

    In preparing to write this blog I talked to family and friends about the topic. "It's going to be about UNIX shell tools!" I would gleefully state, and I kept getting that same glassy-eyed stare that I'm used to getting if I bring up any of the following topics:

    • "The Eastern Question" in nineteenth-century European politics, dealing with instabilities caused by the collapse of the Ottoman Empire,

    • the Laplace Transform in advanced calculus, economics and engineering, and

    • one's own personal medical problems

    And yet I persist. This stuff has been massively useful to me, and I'm just trying to follow the "hacker ethic" (white hat version) and share my best tips and tricks. I take it on faith that there is someone out there who wants and needs this information.

    I make frequent use of the following UNIX shell tools (i.e., software that can be invoked from a UNIX shell or equivalent):

Thankfully, much of the oral tradition of the UNIX shell tools was captured in the still-useful book "The UNIX Programming Environment" (1983) by Brian W. Kernighan & Rob Pike.

If you do find yourself in that igloo with a generator it's nice to know it's there.

Using What I'd Learned

In 1985 I made a mistake I would never repeat in this century: I quit one job before another was firmly in hand. When the new opportunity slipped away I not only faced the inconvenience of having to find another while earning no money, I found myself going through UNIX withdrawal. There had been a time when I didn't know what UNIX was; now I couldn't live without it. I recalled that my email and conferencing account at "The Well"

included a C shell login, and I dialed up just to edit my aliases with vi for old time's sake.

Then when I got another job it was on a project maintaining some crufty FORTRAN code on a clunky little Hewlett Packard minicomputer with a poorly-designed proprietary operating system whose name I don't even remember. A book came to my rescue: good old "Software Tools."

I didn't have the time to implement the "Ratfor" (Rational FORTRAN) pre-processor provided in the book, but I did manage a few pieces, including comment processing (FORTRAN comments have rigid requirements for placement in column numbers), so I could be a little sloppier in my editing and still produce working code quickly. The operating system didn't have UNIX pipes, but I hacked together a work-around using suggestions from the book. And I wrote a "find" program, which helped me make an automated index of all the function and subroutine calls in the code base, which had never been done before. This set of strategies made my life much easier and only confirmed the productivity benefits of UNIX in my mind.

The Gift of LINUX


Linus Torvalds stamp
( uncyclopedia.wikia.com/wiki/Linus_Torvalds )

For a while I used tools on a variety of "workstations" from DEC, IBM, Sun, HP, SGI, and others, and my tools moved with me from one to the next. The problem was most of these systems cost in excess of $40,000, and I couldn't afford one myself. It was only at work that I had access to UNIX. Then with the arrival of Windows 95 it looked like UNIX was destined for the scrap heap, overtaken by another proprietary OS on a dirt cheap hardware platform. I worried that my favorite OS and its whole ecology were going to become extinct before my eyes.

And along came Linux, the open-source freeware UNIX work-alike that had since taken over the world. I couldn't be happier about the way this has worked out. My old tools have new life. I'd like to emphasize that I had practically nothing to do with it, except being a cheerleader and a user. I am extremely grateful for all the talented people in the open source world who have made it possible for me to keep using my favorite tools on modern computers.

Sometimes I think the greatest impediment to more widespread Linux usage is the difficulty of pronouncing it correctly. Linux creator Linus Torvalds has a name that sounds like a popular American comic strip character, Linus from "Peanuts," but the comic character has a long I (as in "like") while the Swedish programmer has a short I (as in "lick"), and so does Linux. Perhaps a re-branding is needed.

Portability Survival Skills


19th century interface adapter
( ridevintage.com/railway-bicycles/ )

It wasn't until the last twenty years that I had Linux on a home computer. Meanwhile, my wife uses Windows at home (Windows 7 at this point), but I prefer Mac for my primary system. But during this same two decades I have always had some for of Windows I was required to use in my work. In that world what has come to my rescue is the tool set called CygWin from Cygnus.

One of the three founders of Cygnus Solutions the company is a friends of mine; imagine my surprise to spot him posed with the other two in a hot tub on the cover of "Forbes" magazine in 1994.

The CygWin tools give you most of the tools I describe above on Windows platforms, and they're free. I always load them right away whenever I am issued a PC.

Since 1996 I have been investing my time in the technologies of HTML and Java for their portability. I create content (such as this blog) in raw HTML using vi, confident that many other programs can read it.

The appeal of Java was the "write once, run anywhere" philosophy, which mobilized Microsoft to embrace and sabotage Java, creating a "write once, test everywhere" requirement in real deployment. That battle seems to be over now, and instead we're watching Oracle, who bought Sun Microsystems the Java creators, fumble the franchise in creative new ways. Still, I have Java code that runs on Windows, Mac and Linux that I continue to maintain and use.

Vindication


finally, a book that agrees with me

Lest you think that I'm this fossil who can only write C code to process text, let me assure you that I've kept up with the changing world of software development, and I've used modern languages such as Objective C, Javascript and PHP, Microsoft tools like Visual Basic and Access, Integrated Development Environments (IDEs) such as JBoss and Visual Studio, Graphical User Interface (GUI) frameworks such as XWindows, Microsoft Foundation Classes, and Java Swing, and cutting-edge development methodologies such as Software Patterns and Agile Development.

But at a point a few years back when I did some soul-searching on my "core competencies," I ran across a rule of thumb from the book "Blink: The Power of Thinking Without Thinking" (2007) by Malcolm Gladwell.

He said that to become a master at something you have to put in 10,000 hours over the course of 10 years (if I remember right). I realized the one thing I've done that long is code in C.

Amazingly, about the time I began to embrace my inner C programmer, I discover that, at least for a few months, C was the world's most popular programming language, having experienced a new renaissance.

Eventually it seemed like history caught up with me. All the people who bet on Visual Basic had to start over with C Sharp, as did all the poeple who bet on Visual C++, but the UNIX/C code base just keeps on working. I was gratified that books began to emerge by people who shared my views:

  • "In the Beginning Was the Command Line" (1998) by Neal Stephenson

    This delightful manifesto by cyberpunk sci-fi author Neal Stephenson of "Snow Crash," "Diamond Age" and "Cryptonomicon" fame — one of the few SF authors I know of who can actually program — explains why real programmers still mostly use their keyboards, and there is no "royal road" to clicking your way into software development.

    His analogy of Macs to sportcars, Windows to clunky station wagons, and Linux to urban tanks, is priceless.

  • "The Pragmatic Programmer: From Journeyman to Master" (1999) by Andrew Hunt and David Thomas

    It is a rarity for me to read a book and be exclaiming "Yes!" with nearly every page, but it happened with this one. It was even more exciting when I got into the stuff I didn't know about, because by then I trusted the author completely.

    One of the most important principles espoused here is "Don't Repeat Yourself" (DRY), which often requires code that writes other code. This has long been one of my favorite tricks, and it is almost magical in powers. If used correctly it can prevent whole classes of errors as well as the tedium of hand-coding highly redundant source code.

  • "Data Crunching: Solve Everyday Problems Using Java, Python, and more" (2005) by Greg Wilson

    This follow-on book by the same publisher deals with the very issues I grapple with weekly: approaching some unknown input file with a need to rationalize, "cleanse" and reformat its contents.

Moving Forward


from article: "Linux Now Has 'Double' the Market Share of Windows"
( www.tomshardware.com/news/linux-windows-microsoft-android-ios,20220.html )

The new year brought a new client for my consulting, and I once again found myself having to use nearly all the tools in my chest to get jobs done quickly. As Kernighan & co. pointed out more than three decades ago, a lot of what passes for programming involves processing text files in predictable ways. I keep encountering the same "patterns" in the problems I solve, and there's almost always a "sort" involved.

A lot of kibitzers tell me there are other ways to solve my problems, but they can't seem to get things done as quickly as I can with the UNIX tools.

I went "grepping" through some of the projects I've worked on in the last six months, and I found, in addition to programs I wrote in C, Java, Python and the 'R' statistics language, I used these shell tools frequently:

  • awk
  • cat
  • cp
  • echo
  • grep
  • head
  • join
  • mv
  • python
  • rm
  • sed
  • sort
  • tail
  • uniq
  • vi
  • wc

And I endeavor to continue to learn. For about four years I have been playing with Python, and I find it quite promising. It doesn't hurt that its name comes from Monty Python's Flying Circus, the British comedy troupe, and not a snake. And just this year I finally began to dig into an old UNIX favorite: the text processing program called awk.

It's named after the three people who created it: Alfred Aho, Peter Weinberger, and... wait for it... the ubiquitous Brian W. Kerhighan.

Further Reading

One of the best ways I have found to absorb a new computer language quickly is to study very short programs or scripts. So called "one-liners" pack a lot into a small space. The key is to find a tutorial web site that has a maniacal obsession with explaining every little nuance of each one-liner.

Another great resource is the "Stack Overflow" web site.

Using a clever combination of curating, moderating and crowdsourcing they maintain a highly reliable and timely question and answer site for programming. Many of my questions have been answered there. (Note to self: maybe it's time I gave something back.)

And of course there are good old books. Here are my recommendations for a well-rounded programmer:

  • "Computer Lib: You Can and Must Understand Computers Now" (1974) by Theodor H. Nelson

    I was fortunate to have this highly educational comic book and manifesto to learn the inner guts of computing, just before starting my first job in the industry.

  • "The Mythical Man-Month: Essays on Software Engineering" (1975) by Frederick P. Brooks Jr.

    This classic of software management is still as relevant as the day it was published, even though it's based on the work done for a large IBM mainframe in the 1960s. Ask anybody.

  • "Hackers: Heroes of the Computer Revolution" (1984) by Steven Levy

    I firmly believe that in order to program well one must learn to think like a programmer, which means learning about so-called "hacker culture," including the "hacker ethic" and "hacker humor." Technical journalism superstar Steven Levy packages the essential history in this book. In addition it is useful to refer to following lexicons:

    • "The New Hacker's Dictionary" (1996) by Eric S. Raymond (editor)

      A book form of a long-lived and highly-evolved computer file, with heavy contributions from Stanford and MIT.

    • "The Devil's DP Dictionary" (1981) by Stan Kelly-Bootle

      One man's sarcastic reworking of Ambrose Bierce's venerable "Devil's Dictionary" (1911).

      Dated by its mainframe point of view, but still hilarious and educational. It has the classic definition, "recursive — see recursive."

      From these sources you will learn that the name UNIX is a joke, derived from the Multics operating system developed by MIT, General Electric and Bell Labs..

  • "The Information: A History, A Theory, A Flood" (2011) by James Gleick

    Sometimes it's difficult to see the forest for the trees. As the Information Revolution has washed over us through the last 70 years — a human lifetime — it has changed nearly every aspect of our technical civilization. Gleick, an excellent technology journalist, puts the pieces together here with a nice long view. He also provides a good overview of the pioneering work of mathematician Alan Turing, and its relevance to computing today.


Disclaimer

I receive a commission on everything you purchase from Amazon.com after following one of my links, which helps to support my research. It does not affect the price you pay.


This free service is brought to you by the book:

A Survival Guide for the Traveling Techie
travelingtechie.com