Bonjour Pascale, Delphine et Karl,
G'day. :) I've generated a glossary based on the "nwtitle" and "nwurl" tags which are found in all XML files except 'data.xml'. The data format is:
nwtitle @@ nwurl
This file is found at http://www.normandieweb.org/web/glossary.txt.
The following is the Perl source, which parses each XML file using XML::LibXML and outputs a glossary item:
#!/usr/bin/perl -w # # genGlossaryItem.pl # This script is exclusively for generating a glossary for NormandieWeb. # It should be called from the shell with the name of an XML file # as the only argument. # It parses this file and prints out the contents # of <nwtitle> and <nwurl>, # delimited by ' @@ '. # Eg: # Rouen par Aurore Daubenfeld @@ \ http://www.normandieweb.org/76/rouen/rouen/rouenaurore.html # # Author: Stephanie Troeth # Date: 14th November 2002 # use strict; use XML::LibXML; local $XML::LibXML::skipDTD = 1; # skip DTD # Bail out if we don't have the single argument we are expecting to receive if ($#ARGV != 0) { print "Usage: genGlossaryItem.pl <xmlfilename>\n"; exit; } # call our main &main(); #~~~~~~~~~~~~~~~ # main function #~~~~~~~~~~~~~~~ sub main { my $xmlfile = shift(@ARGV); # get commandline argument my $parser = XML::LibXML->new(); # create new instance of parser my $tree = $parser->parse_file($xmlfile); # parse my $root = $tree->getDocumentElement; # get the root of our doc tree # use XPath to get the title and url my $title = $root->findvalue('/nwcity/nwtexte/nwtitle'); my $url = $root->findvalue('/nwcity/nwtexte/nwurl'); print "$title @@ $url\n"; } # end main
The wrapper shell script looks for all relevant XML files and passes each file through genGlossaryItem.pl:
#!/bin/sh # genGlossary.sh # # This script is exclusively for generating a glossary for NormandieWeb. # It recursively finds XML files (all except anything with 'data' in # the file/path name) and passes each file to genGlossaryItem.pl. # genGlossaryItem.pl parses each file # and prints <nwtitle> and <nwurl> # in the format of: # # <nwtitle> @@ <nwurl> # # The output of genGlossaryItem.pl is redirected (appended) to a file # called 'glossary.txt' # # Authors: Stephanie Troeth, Karl Dubost # Date: 14th November 2002 # touch glossary XMLDIR=`find /Users/karl/Sites/NW/nwxml/ -name '*.xml' | grep -v data` echo Start: `date` for i in $XMLDIR; do ./genGlossaryItem.pl $i >> glossary.txt done echo End: `date`
Later on, this file will be referenced by the CMS whenever xHTML pages are generated.
Posted by steph at novembre 15, 2002 05:02 PMHello Steph !
Sacré boulot... Je ne comprends rien à l'anglais ni rien à Perl mais je vois bien que c'est du ... super boulot.
Je compte sur Karl pour te traduire ;-) Et espère que nous pourrons faire facilement la jonction entre ton travail et le mien.
bye !
Posted by: Pascale on novembre 16, 2002 03:43 PM