Extracted function comments Mon Aug 22 00:40:04 2005 =item AdminVersion =cut =item Append =cut =item Assert Usage: ##&Assert( conditional expression ); Assert is a useful debugging tool. Its one argument is a conditional that should be true in every possible case, as long as you've written your code correctly. If the argument turns out to be false at runtime, then Assert will print an error message in very large, bold letters. Often used to audit function input and output values. Possibly these Assert calls should be stripped or disabled in public releases. =cut =item Authenticate =cut =item BuildIndex Usage: &BuildIndex(); BuildIndex completely rebuilds the index for a local realm. Because the webpages in local realms are readily accessible, this function tends to process huge data sets quickly. It is self-restartable through a meta-refresh; state information is stored in the $start_pos parameter and working data is stored either in the database or the index_file.working_copy file. For file-based indexes, all new data is written to index_file.working_copy. When the process is finished, possibly after several browser requests, the original index_file is deleted and index_file.working_copy is renamed over the top of it. Thus, users are able to perform searches on the intact index_file while the BuildIndex process in progress. In addition, it is possible to safely abandon the BuildIndex process. For SQL-based indexes, we don't have that concept of a temporary storage area. Instead, each record is updated as the webpage is encountered. At the end of the BuildIndex process, if we get there, we delete all records whose lastindex time is older than "start_time". The only records older than "start_time" are those that were not detected by GetFilesByDirEx, or that were excluded for other reasons. This is an interactive function; errors and other status messages are shown to the user by printing HTML. =cut =item Cancel =cut =item Capitalize Usage: my $cap_string = &Capitalize($string); Capitalizes English-language strings. =cut =item CheckEmail Usage: my $err = &CheckEmail( $address ); if ($err) { print "

\n"; } Checks whether the argument is a valid email address or not: address not blank contains text @ text text follow @ is valid hostname (can be resolved) Based on Ian Dobson's CheckEmail function. =cut =item Close =cut =item CompressStrip Process the HTML text and various subfields like Title and Description. =cut =item Crawler_new Usage my %response = $crawler->webrequest( 'page' => 'http://www.xav.com/scripts/', 'limit' => 'http://www.xav.com/', ); if ($response{'err'}) { print "

\n"; exit; } print "The HTML text of this web page is:\n\n"; print $response{'text'}; =cut =item DeleteFromPending Usage: my ($err, $delcount) = &DeleteFromPending( $realm, \@urls ); =cut =item FD_Rules_new Initializes the object that manages system settings. =cut =item FlockEx Usage: if (&FlockEx( $p_filehandle, 8 )) { # okay } Abstraction layer to protect non-flock systems. =cut =item FormatDateTime =cut =item FormatNumber Usage: my $num_str = &FormatNumber( $expression, $decimal_places, $include_leading_digit, $use_parens_for_negative, $group_digits, $euro_style ); Arguments $expression Required. Expression to be formatted. $decimal_places Optional. Numeric value indicating how many places to the right of the decimal are displayed. Note: truncates $expression to $decimal_places, does not round. $include_leading_digit Optional. Boolean that indicates whether or not a leading zero is displayed for fractional values. $use_parens_for_negative Optional. Boolean that indicates whether or not to place negative values within parentheses. Style is used for outbound formatting only; inbound parsing always uses "-" for dec (Perl's internal format) $group_digits Optional. Boolean that indicates whether or not numbers are grouped using the comma. $euro_style Optional. If 1, then "." separates thousands and "," separates decimal. i.e. "800.234,24" instead of "800,234.24". Style is used for outbound formatting only; inbound parsing always uses "." for dec (Perl's internal format) Prototyped to match Microsoft's FormatNumber function for vbscript/jscript, with the limitation of not knowing about default settings. Microsoft specification at http://msdn.microsoft.com/scripting/vbscript/doc/vsfctFormatNumber.htm or from http://msdn.microsoft.com/scripting/. Error handling: if $expression is not numeric, is treated as 0 =cut =item GetCrawlList Usage: my @list = (); my $count = 0; my $age = $::FORM{'StartTime'}; if ($::FORM{'DaysPast'}) { $age -= (86400 * $::FORM{'DaysPast'}); } my $err = &GetCrawlList( $realm, $age, $max_list_size, \@list, \$count ); Retrieves a @list of all web pages in the '$realm' realm that are older than $age. $count is the size that @list would be if no limits were imposed. @list will actually contain between 0 to $max_list_size elements. The max_list_size option is available to save memory. =cut =item GetFiles_new Used to enumerated all files and folders in a certain directory. Designed to use very little memory. Files are always returned in alphabetic order, which allows certain optimizations to be made. Usage: my $fr = &fdse_filter_rules_new(); my $gf = &GetFiles_new(); $err = $gf->create_file_list( 'base_dir' => $base_dir, 'base_url' => $base_url, 'fr' => \$fr, 'tempfile' => "$file.temp", 'no_older_than' => $num_seconds, ); my $count = $gf->{'count'}; $gf->resume_file_position( $start_pos ); while (1) { my ($lastmodt, $size, $fullfile, $basefile, $url) = $gf->get_next_file(); } $gf->quit(); # kills temp file no_older_than is the number of seconds for the maximum tolerable age of the cache file. If the file exists and is older than this, then a new file will be created. =cut =item LoadRules Usage: $err = &LoadRules(); Wrapper around FD_Rules object and it's own loadrules() method. Adds additional processing. Writes directly to the global %::Rules hash. Writes some derived data to %::const as well. =cut =item LockFile_get_read_access Gets read access to the file. Handles the "create_if_needed" logic. Tries to restore a stale "working_copy" file if not copy of the original file exists. =cut =item LockFile_new This package provides an object-oriented approach to file I/O, with support for file locking and standardized error handling. Usage: my ($err, $obj, $p_rhandle, $p_whandle) = (); Err: { $obj = &LockFile_new( 'create_if_needed' => 1, ); ($err, $p_rhandle) = $obj->Read( $file ); next Err if ($err); while ($_ = readline($$p_rhandle)) { print $_; } $err = $obj->Close(); next Err if ($err); last Err; } continue { print "

\n"; } =cut =item Merge =cut =item ParseRobotFile Usage: my @forbidden_paths = &ParseRobotFile( $RobotText, $my_user_agent ); Accepts the text of a robots.txt file, and the string name of the current HTTP user-agent. Parses through the file and returns an array of all forbidden paths that apply to the current user-agent. =cut =item PrintOrderedHash Usage: my $err = &PrintOrderedHash( \%hash, $by_value, $ascii_sort, $ascending, $date_map ); =cut =item PrintTemplate Usage: &PrintTemplate( $b_return_as_string, 'tips.html', 'german', \%replace_values, \%visited, \%cache ); See "admin_help.html" for extensive documentation on this function, its limitations, its failure scenarios, etc. =cut =item RawTranslate Usage: my $lc_ai_string = &RawTranslate($string); Returns a lowercase, accent-stripped version on its input. Replaces HTML-encoded characters with their ASCII equivalents. This function is called mainly by &CompressStrip; also by &LoadRules when preparing the code for ignore words. See http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html =cut =item Read =cut =item ReadFile Usage: my ($err, $text) = &ReadFile($file); if ($err) { print "

\n\n"; print "\n$trace\n\n"; SendMailEx knows of 2 ways to handle a message: 1. pipe the message to a process, such as /usr/sbin/sendmail or c:/blat.exe, defined with the 'pipeto' parameter If using /usr/sbin/sendmail, include the "-t" flag in the pipeto input, i.e.: 'pipeto' => '/usr/sbin/sendmail -t', 2. deliver to a known SMTP server, defined using the 'host' paramater The options are listed above in the order of speed and reliability. Saving the message to a folder is generally just a failover method to prevent the loss of user data - no message will actually be sent. By default, SendMailEx will attempt those methods in order. You can override this with the 'handler_order' parameter, which is a string like "12345" or "54321" or "23". If parameters 'pipeto', 'host', or 'saveto' aren't defined, this process will skip the handling methods which depend on them. =cut =item SetDefaults Usage: my $text = &SetDefaults( $html, \%params ); Takes $html, which is an HTML fragment including FORM elements, and sets all default attributes to match %params. Requires strict format: Generally will accept double-quoted attributes, or unquoted attributes which don't contain any embedded space. In the case of replacing "hidden"-type fields, will only insert new values for hidden form elements that do not already have a value. For example, the tag: will receive an automatic value="" attribute, but the tag: will not be touched, since there is already an explicit value="" attribute. This code will insert checked="checked" and selected="selected" attributes for the appropriate form elements. It will overwrite existing checked/selected attributes. The code will overwrite default value="x" values for INPUT TEXT and INPUT PASSWORD and TEXTAREA. changed 2002-05-16 now case-preserving on INPUT|SELECT|TEXTAREA tags attempting to be more XHTML compliant with output changed 2005-07-07 inserting leading line break before value in not note that this will cause corruption in Opera, v7 + v8.01, because Opera does not ignore the leading \r\n, and so each time the textarea is edited, a new newline will appear at the top. A bug has been logged against Opera for this: http://my.opera.com/forums/showthread.php?threadid=94823&highlight=textarea bug 173282 TODO / BUG: this code doesn't properly handle