www.bitterberg.de -- Tilmanns Corner -- sed

Tilmanns Corner - sed

Quickstart:
--> Links
--> Apache log analyser
--> Mail addsfilter
--> Mail to HTML
--> Pine addresses to Vim
--> Callerid to Vbox
--> Indexhtml
--> diff(1) HTML beautifier
<-- Tilmanns Corner
<-- Mainpage

Links

SED links from the excellent FAQ from http://www.student.northpark.edu/pemente/sed/sedfaq.html

Eric Pement http://www.student.northpark.edu/pemente/sed/index.htm

Seders Grab Bag http://spazioweb.inwind.it/seders/

Sven Guckes http://www.math.fu-berlin.de/~guckes/sed/

Felix von Leitner http://www.math.fu-berlin.de/~leitner/sed/

Yiorgos Adamopoulos http://www.dbnet.ece.ntua.gr/~george/sed/

apache log analyser

A very simple Weblog analyser, which extracts the search words from google and altavista referrers, marking them with an a for altavista or G google. Download here: sedlog.gz
Usage:

$ wget http://tibit.org/sedlog.gz $ gzip -d sedlog.gz $ chmod +x sedlog $ ./sedlog access.log | vim -

Mail addsfilter

A sed script that I use as an incoming mail filter to get rid of the adds in some newsletters I am subscribed to. I don't had any bad side effect until now ;-) This is used in conjunction with procmail; below is a sample procmail entry:

# remove adds in billiger /+-.*-+/,//{ /+-\{10,\}ANZEIGE-*+/{ c\ WERBUNG ENTSORGT } ;# if pattern found, which is the ;# last line of the add, change it ;# to Werbung entsorgt. d ;# delete pattern-space. } # remove footer in billiger /^\*\**\*\ $/,/^$/d # remove gmx-adds /^####*Anzeige###*$/,/^##*#$/{ ;# // repeats last regexp //{ ;# if // found N ;# append next line to pattern-buffer /\n$/c\ WERBUNG ENTSORGT } ;# if a \n is at the end of pattern-space ;# change it to WERBUNG ENTSORGT d ;# delete pattern-space }

:0fw: * ^From:.*\<gmxred@gmx\.net\>.* | sed -f $HOME/scripts/sed/deladds.sed >> /home/tibit/mail/.IN/gmx

Mail to HTML

This script converts emails to HTML which is quite nice for printing. Note that this is not written by me, just modified.
URL: mail2html.sed

Pine adresses to vim

I use pine as my mailclient and Vim as my editor to write the mail. I wanted to use the email addresses defined in pine's .addressbook file as abbrevations in Vim. The following script generates these abbrevations of the following form: mm<NICKNAME>
URL: pine_addr_2_vim_ab.sed

# Format .addressbook to vim abbrevations :redo; # Define label redo $s/$[^ ]*$ [^ ]* $[^ ]*$.*/ab mm\1 \2\3/ N; /\n /!{ s/$[^ ]*$ [^ ]* $[^ ]*$[^\n]*$\n$/ab mm\1 \2\3/ P D } s/\n//g s/ *//g b redo;

Callerid to Vbox

Generates ISDN callerid.conf entries to the vbox form. WARNING: adjust the dialing prefix.

/^#/d /^$/d /^ZONE.*/d /^INTERFACE.*/d /^\[MSN\]/{ N N s/^\[MSN\]\nNUMBER\ *=\ *// s/ALIAS\ *=\ *// s/\n/ - / s/^+49// s/^+// /^[0-9]\{8,\}/!{ /^711/!s/^/711/ # ^^^ ^^^ Anpassen } s/_/ /g }

Indexhtml

A pretty long script which generates an index of links of an HTML file. I recently recognized when you print a website you can't click on the links anymore ;-) I use it as a printing filter for the lpr. Maybe sed complains about line 76; just use an different separator.
URL: indexhtml.sed

#!/bin/sed -f # Thu May 18 12:43:45 CEST 2000 by tilmann@bitterberg.de # # Description: # Creates an index of links from a HTML file # Does something similar like lynx -force_html -dump but # leaves the document html (generate an index of links) # # Example: Input # <HTML><HEAD></HEAD><BODY> # foo1 <a Href="http://link.org">Click here</a> foo2 # </BODY></HTML> # # Output: # <HTML><HEAD></HEAD><BODY> # foo1 <a Href="http://link.org">[1] Click here</a> foo2 # <hr>[1] http://link.org<br> # </BODY></HTML> # # NOTE: # 1) Will break at links like <A # HREF # 2) Will only handle a fixed number of links (500 right now) # TODO: # - Remove limits mentioned above # - let it handle weird HTML syntax 1{ # Put numbers in holdspace at first line x s/^/ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 \ 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 \ 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 \ 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 \ 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 \ 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 \ 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 \ 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 \ 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 \ 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 \ 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 \ 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 \ 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 \ 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 \ 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 \ 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 \ 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 \ 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 \ 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 \ 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 \ 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 \ 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 \ 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 \ 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 \ 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 \ 500/ # I don't like too long lines, thats why s/\n//g s/ */ /g x } # Start at line 1 # Sometimes you only want to start from a pattern, so replace # '1' with /PATTERN/ 1,${ /[Aa] *[Hh][Rr][Ee][Ff] *= *"[^#]/{; # don't look at internal links G; # Get the numbers :loop; # there may be multiple links per line, so loop # the ||||| is used as a marker. # We now have: foo1 <a href="blah.html"> foo2\n 1 2 3 4 .. 500 # using newline as separator to the 's' command and 'I' for casei s $a *href$ *= *$"\([^"]\+$"[^>]*>\)$[^\n]*\(\n$\) $[^ ]*$$.*$$ \1|||||=\2[\6] \4\7\5[\6] \3<br> I #|----1----| |----------2---------||-------4------| |---6---||--7--| # |---3----| |--5-| # Field Contains: # \1 a href # \2 the link text up to the closing > # \3 the link itself (http://foo.com) # \4 the rest of the input line # \5 a newline (\n) # \6 the number we would like to use # \7 everything up to the end of patternspace # # Now the line looks like: # foo1 <a href|||||="blah.html">[1] foo2\n 2 3 4 .. 500\n[1] blah.html<br> t loop; # look if there is another link in that line s/|||||//g; # delete marker h; # save how many numbers are used s/\n.*//; # "restore" the original line x s/[^\n]*\n// x } } # Just before the </body> insert index /<\/[Bb][Oo][Dd][Yy]>/{ x; # insert saved stuff s/[^\n]*\n//; # delete unused numbers s/^/<hr>/ G }

Diff(1) to HTML

You can use it to generate nice looking pages of diff output text ("patches"). I have only tried with unified diff's since they are the only ones I use anyway. Here is a small screenshot. The script relies on the external utility `expand(1)' which convert tabs to spaces and should be found on any system.
URL: diffhtml.sed

#!/bin/sh # # Beautifies the output of diff -Nur to the HTML format # needs the expand utility to convert tabs to spaces, to preserve # identation. Output HTML Code is pretty worse i think and the # colors suck somehow # # Sat Apr 14 21:59:36 CEST 2001 # by Tilmann Bitterberg expand | sed ' s/>/\>/g; s/</\</g s|ü|\ü|g; s|Ü|\Ü|g; s|ä|\ä|g s|Ä|\Ä|g; s|ö|\ö|g; s|Ö|\Ö|g s|ß|\ß|g s/^+$/+ / s/^-$/- / s/^$/ / /^[-+]/!s/$/<BR>/ 1i\ <HTML><HEAD></HEAD><BODY bgcolor=white> /^diff /{i\ <P> s|[^/]*/|| s| .*|| s|.*|<font color=white><TT>&</TT></FONT>| s|.*|<table border=0 cellspacing=0 width=100% bgcolor=#057205>\ <TR><TD>&</TD></TR></TABLE><P>| } # Could be done in one line /^@@ /{ s|^@@|<font color=red><TT>@@| s|$|</TT></FONT>| } \|^---|s|.*|<TT><font color=lightblue>&</FONT></TT><BR>| \|^+++|s|.*|<TT><font color=darkblue>&</FONT></TT><BR>| # Take care of removed lines /^-[^-]/{i\ <table border=0 cellspacing=0 width=100% bgcolor=#eddcdc><TR><TD>\ <TT><FONT color=darkred> :a N s/\n-/||||/ ta h s/.*\n// /<BR>$/!s/$/<BR>/ x s/$.*$\n[^\n]*$/\1/ s/||||/<BR>\ -/g :space1 / /{ s| |\  | bspace1 } s|$|</FONT></TT></TD></TR></TABLE>|p x } # Take care of added lines /^+[^+]/{i\ <table border=0 cellspacing=0 width=100% bgcolor=#dbf7ff><TR><TD>\ <TT><FONT color=darkblue> :b N s/\n+/||||/ tb h s/.*\n// /<BR>$/!s/$/<BR>/ x s/$.*$\n[^\n]*$/\1/ s/||||/<BR>\ +/g :space2 / /{ s| |\  | bspace2 } s|$|</FONT></TT></TD></TR></TABLE>|p x } # We need to do this, because we only want the spaces at the beginning :c /^Ä\+[^ ][-_A-Za-z0-9]/bend s/^$Ä*$ /\1Ä/ tc :end s/Ä/\ /g /^ /{ s|^|<TT>| s|$|</TT>| } :space3 / /{ s| |\  | bspace3 } $a\ </PRE></BODY></HTML> ' # sed done

Eric Pement	http://www.student.northpark.edu/pemente/sed/index.htm
Seders Grab Bag	http://spazioweb.inwind.it/seders/
Sven Guckes	http://www.math.fu-berlin.de/~guckes/sed/
Felix von Leitner	http://www.math.fu-berlin.de/~leitner/sed/
Yiorgos Adamopoulos	http://www.dbnet.ece.ntua.gr/~george/sed/