Bonjour Pascale, Delphine et Karl,
G'day. :) I've generated a glossary based on the "nwtitle" and "nwurl" tags which are found in all XML files except 'data.xml'. The data format is:
nwtitle @@ nwurl
This file is found at http://www.normandieweb.org/web/glossary.txt.
The following is the Perl source, which parses each XML file using XML::LibXML and outputs a glossary item:
#!/usr/bin/perl -w
#
# genGlossaryItem.pl
# This script is exclusively for generating a glossary for NormandieWeb.
# It should be called from the shell with the name of an XML file
# as the only argument.
# It parses this file and prints out the contents
# of <nwtitle> and <nwurl>,
# delimited by ' @@ '.
# Eg:
# Rouen par Aurore Daubenfeld @@ \
http://www.normandieweb.org/76/rouen/rouen/rouenaurore.html
#
# Author: Stephanie Troeth
# Date: 14th November 2002
#
use strict;
use XML::LibXML;
local $XML::LibXML::skipDTD = 1; # skip DTD
# Bail out if we don't have the single argument we are expecting to receive
if ($#ARGV != 0)
{
print "Usage: genGlossaryItem.pl <xmlfilename>\n";
exit;
}
# call our main
&main();
#~~~~~~~~~~~~~~~
# main function
#~~~~~~~~~~~~~~~
sub main
{
my $xmlfile = shift(@ARGV); # get commandline argument
my $parser = XML::LibXML->new(); # create new instance of parser
my $tree = $parser->parse_file($xmlfile); # parse
my $root = $tree->getDocumentElement; # get the root of our doc tree
# use XPath to get the title and url
my $title = $root->findvalue('/nwcity/nwtexte/nwtitle');
my $url = $root->findvalue('/nwcity/nwtexte/nwurl');
print "$title @@ $url\n";
}
# end main
The wrapper shell script looks for all relevant XML files and passes each file through genGlossaryItem.pl:
#!/bin/sh # genGlossary.sh # # This script is exclusively for generating a glossary for NormandieWeb. # It recursively finds XML files (all except anything with 'data' in # the file/path name) and passes each file to genGlossaryItem.pl. # genGlossaryItem.pl parses each file # and prints <nwtitle> and <nwurl> # in the format of: # # <nwtitle> @@ <nwurl> # # The output of genGlossaryItem.pl is redirected (appended) to a file # called 'glossary.txt' # # Authors: Stephanie Troeth, Karl Dubost # Date: 14th November 2002 # touch glossary XMLDIR=`find /Users/karl/Sites/NW/nwxml/ -name '*.xml' | grep -v data` echo Start: `date` for i in $XMLDIR; do ./genGlossaryItem.pl $i >> glossary.txt done echo End: `date`
Later on, this file will be referenced by the CMS whenever xHTML pages are generated.
Posted by steph at novembre 15, 2002 05:02 PMHello Steph !
Sacré boulot... Je ne comprends rien à l'anglais ni rien à Perl mais je vois bien que c'est du ... super boulot.
Je compte sur Karl pour te traduire ;-) Et espère que nous pourrons faire facilement la jonction entre ton travail et le mien.
bye !
Posted by: Pascale on novembre 16, 2002 03:43 PM