Tilmanns Corner - sed

Quickstart:
--> Links
--> Apache log analyser
--> Mail addsfilter
--> Mail to HTML
--> Pine addresses to Vim
--> Callerid to Vbox
--> Indexhtml
--> diff(1) HTML beautifier
<-- Tilmanns Corner
<-- Mainpage

Links

SED links from the excellent FAQ from http://www.student.northpark.edu/pemente/sed/sedfaq.html

Eric Pement http://www.student.northpark.edu/pemente/sed/index.htm
Seders Grab Bag http://spazioweb.inwind.it/seders/
Sven Guckes http://www.math.fu-berlin.de/~guckes/sed/
Felix von Leitner http://www.math.fu-berlin.de/~leitner/sed/
Yiorgos Adamopoulos http://www.dbnet.ece.ntua.gr/~george/sed/

apache log analyser

A very simple Weblog analyser, which extracts the search words from google and altavista referrers, marking them with an a for altavista or G google. Download here: sedlog.gz
Usage:
  
  $ wget http://tibit.org/sedlog.gz
  $ gzip -d sedlog.gz
  $ chmod +x sedlog
  $ ./sedlog access.log | vim - 

Mail addsfilter

A sed script that I use as an incoming mail filter to get rid of the adds in some newsletters I am subscribed to. I don't had any bad side effect until now ;-) This is used in conjunction with procmail; below is a sample procmail entry:
  
  # remove adds in billiger
  /+-.*-+/,//{
    /+-\{10,\}ANZEIGE-*+/{
    c\
  WERBUNG ENTSORGT
    }			;# if pattern found, which is the
    			;# last line of the add, change it
  			;# to Werbung entsorgt.
  d			;# delete pattern-space.
  }
  
  # remove footer in billiger
  /^\*\**\*\ $/,/^$/d
  
  # remove gmx-adds
  /^####*Anzeige###*$/,/^##*#$/{	;# // repeats last regexp
    //{			;# if // found 
    N			;# append next line to pattern-buffer
    /\n$/c\
  WERBUNG ENTSORGT
    }			;# if a \n is at the end of pattern-space
    			;# change it to WERBUNG ENTSORGT
  d			;# delete pattern-space
  
  }  

  :0fw:
  * ^From:.*\<gmxred@gmx\.net\>.*
  | sed -f $HOME/scripts/sed/deladds.sed >> /home/tibit/mail/.IN/gmx 

Mail to HTML

This script converts emails to HTML which is quite nice for printing. Note that this is not written by me, just modified.
URL: mail2html.sed

Pine adresses to vim

I use pine as my mailclient and Vim as my editor to write the mail. I wanted to use the email addresses defined in pine's .addressbook file as abbrevations in Vim. The following script generates these abbrevations of the following form: mm<NICKNAME>
URL: pine_addr_2_vim_ab.sed
  
  # Format .addressbook to vim abbrevations
  :redo;	# Define label redo
  $s/\([^	]*\)	[^	]*	\([^	]*\).*/ab mm\1 \2\3/
  N;
  /\n   /!{
    s/\([^	]*\)	[^	]*	\([^	]*\)[^\n]*\(\n\)/ab mm\1 \2\3/
    P
    D
  }
  s/\n//g
  s/   *//g
  b redo;
    

Callerid to Vbox

Generates ISDN callerid.conf entries to the vbox form. WARNING: adjust the dialing prefix.

  /^#/d
  /^$/d
  /^ZONE.*/d
  /^INTERFACE.*/d
  /^\[MSN\]/{
    N
    N
    s/^\[MSN\]\nNUMBER\ *=\ *//
    s/ALIAS\ *=\ *//
    s/\n/	-	/
    s/^+49//
    s/^+//
    /^[0-9]\{8,\}/!{
      /^711/!s/^/711/
    #   ^^^      ^^^ Anpassen
    }
    s/_/ /g
  }  

Indexhtml

A pretty long script which generates an index of links of an HTML file. I recently recognized when you print a website you can't click on the links anymore ;-) I use it as a printing filter for the lpr. Maybe sed complains about line 76; just use an different separator.
URL: indexhtml.sed

#!/bin/sed -f
# Thu May 18 12:43:45 CEST 2000 by tilmann@bitterberg.de
#
# Description:
# Creates an index of links from a HTML file
# Does something similar like lynx -force_html -dump but
# leaves the document html (generate an index of links)
# 
# Example: Input
# <HTML><HEAD></HEAD><BODY>
# foo1 <a Href="http://link.org">Click here</a> foo2
# </BODY></HTML>
#
# Output:
# <HTML><HEAD></HEAD><BODY>
# foo1 <a Href="http://link.org">[1] Click here</a> foo2
# <hr>[1] http://link.org<br>
# </BODY></HTML>
# 
# NOTE:
# 1) Will break at links like <A 
#                                HREF
# 2) Will only handle a fixed number of links (500 right now)
# TODO:
# - Remove limits mentioned above
# - let it handle weird HTML syntax


1{
  # Put numbers in holdspace at first line
  x
  s/^/ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 \
  27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 \
  52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 \
  77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 \
  101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 \
  120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 \
  139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 \
  158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 \
  177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 \
  196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 \
  215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 \
  234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 \
  253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 \
  272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 \
  291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 \
  310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 \
  329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 \
  348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 \
  367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 \
  386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 \
  405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 \
  424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 \
  443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 \
  462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 \
  481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 \
  500/
  # I don't like too long lines, thats why
  s/\n//g
  s/  */ /g
  x
}

# Start at line 1
# Sometimes you only want to start from a pattern, so replace
# '1' with /PATTERN/
1,${
  /[Aa]  *[Hh][Rr][Ee][Ff] *= *"[^#]/{;  # don't look at internal links
    G;     # Get the numbers
    :loop; # there may be multiple links per line, so loop

    # the ||||| is used as a marker.
    # We now have: foo1 <a href="blah.html"> foo2\n 1 2 3 4 .. 500
    # using newline as separator to the 's' command and 'I' for casei
s
\(a  *href\) *= *\("\([^"]\+\)"[^>]*>\)\([^\n]*\(\n\)\) \([^ ]*\)\(.*$\)
\1|||||=\2[\6] \4\7\5[\6] \3<br>
I

#|----1----|     |----------2---------||-------4------| |---6---||--7--|
#                   |---3----|                 |--5-|    
# Field Contains:
# \1    a href
# \2    the link text up to the closing >
# \3    the link itself (http://foo.com)
# \4    the rest of the input line
# \5    a newline (\n)
# \6    the number we would like to use
# \7    everything up to the end of patternspace
#                       
# Now the line looks like:
# foo1 <a href|||||="blah.html">[1] foo2\n 2 3 4 .. 500\n[1] blah.html<br>
 
    t loop;        # look if there is another link in that line
    
    s/|||||//g;    # delete marker
    h;             # save how many numbers are used
    s/\n.*//;      # "restore" the original line
    x
    s/[^\n]*\n//
    x
  }
}

# Just before the </body> insert index
/<\/[Bb][Oo][Dd][Yy]>/{
  x;               # insert saved stuff
  s/[^\n]*\n//;    # delete unused numbers
  s/^/<hr>/
  G
}
  

Diff(1) to HTML

You can use it to generate nice looking pages of diff output text ("patches"). I have only tried with unified diff's since they are the only ones I use anyway. Here is a small screenshot. The script relies on the external utility `expand(1)' which convert tabs to spaces and should be found on any system.
URL: diffhtml.sed

#!/bin/sh
#
# Beautifies the output of diff -Nur to the HTML format
# needs the expand utility to convert tabs to spaces, to preserve 
# identation. Output HTML Code is pretty worse i think and the
# colors suck somehow
#
# Sat Apr 14 21:59:36 CEST 2001
# by Tilmann Bitterberg

expand |
sed '
s/>/\&gt;/g;   s/</\&lt;/g
s|ü|\&uuml;|g; s|Ü|\&Uuml;|g; s|ä|\&auml;|g
s|Ä|\&Auml;|g; s|ö|\&ouml;|g; s|Ö|\&Ouml;|g
s|ß|\&szlig;|g

s/^+$/+ /
s/^-$/- /
s/^$/ /
/^[-+]/!s/$/<BR>/

1i\
<HTML><HEAD></HEAD><BODY bgcolor=white>

/^diff /{i\
<P>
s|[^/]*/||
s| .*||
s|.*|<font color=white><TT>&</TT></FONT>|
s|.*|<table border=0 cellspacing=0 width=100% bgcolor=#057205>\
<TR><TD>&</TD></TR></TABLE><P>|
}

# Could be done in one line
/^@@ /{
s|^@@|<font color=red><TT>@@|
s|$|</TT></FONT>|
}

\|^---|s|.*|<TT><font color=lightblue>&</FONT></TT><BR>|
\|^+++|s|.*|<TT><font color=darkblue>&</FONT></TT><BR>|

# Take care of removed lines
/^-[^-]/{i\
<table border=0 cellspacing=0 width=100% bgcolor=#eddcdc><TR><TD>\
<TT><FONT color=darkred>
:a
N
s/\n-/||||/
ta
h
s/.*\n//
/<BR>$/!s/$/<BR>/
x
s/\(.*\)\n[^\n]*$/\1/
s/||||/<BR>\
-/g
:space1
/  /{
  s|  |\&nbsp; |
  bspace1
}
s|$|</FONT></TT></TD></TR></TABLE>|p
x
}

# Take care of added lines
/^+[^+]/{i\
<table border=0 cellspacing=0 width=100% bgcolor=#dbf7ff><TR><TD>\
<TT><FONT color=darkblue>
:b
N
s/\n+/||||/
tb
h
s/.*\n//
/<BR>$/!s/$/<BR>/
x
s/\(.*\)\n[^\n]*$/\1/
s/||||/<BR>\
+/g
:space2
/  /{
  s|  |\&nbsp; |
  bspace2
}
s|$|</FONT></TT></TD></TR></TABLE>|p
x
}

# We need to do this, because we only want the spaces at the beginning

:c
/^Ä\+[^ ][-_A-Za-z0-9]/bend
s/^\(Ä*\) /\1Ä/
tc

:end
s/Ä/\&nbsp;/g
/^&nbsp;/{
  s|^|<TT>|
  s|$|</TT>|
}

:space3
/  /{
  s|  |\&nbsp; |
  bspace3
}

$a\
</PRE></BODY></HTML>
' # sed done 

top
© 2001 Tilmann Bitterberg, tilmann@bitterberg.de

www.vim.org