This file is part of the pdr/pdx project.
Copyright (C) 2010 Torsten Mueller, Bern, Switzerland

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

pdr/pdx 1.1.0 - User Manual

pdr ("personal data recorder") and pdx ("personal data expert") are free applications for tracking, managing and evaluation of personal, mostly numeric data.

Contents

1. Introduction and basics

2. Glossar
2.1. collections
2.2. rejections
2.3. expressions
2.4. selections

3. Working principles
3.1. pdr
3.1.1. Functionality
3.1.2. Data sources and transactions
3.1.3. Input per command line
3.1.4. Input per mail (POP3)
3.1.5. Input per text files
3.1.6. Input per CSV files
3.1.7. Input per XML files
3.1.7.1. The pdr XML format
3.1.8. Interactive Mode
3.1.9. XML-Export
3.2. pdx
3.2.1. Functionality
3.2.2. Built-in functions
3.2.2.1. Date- and time functions
3.2.2.2. Functions for data selection
3.2.2.3. Statistic functions
3.2.2.4. Arithmetic functions
3.2.2.5. Functions for reports
3.2.2.6. Functions for diagrams
3.2.2.7. Other functions
3.2.3. Interactive mode
3.2.4. All thogether: the creation of reports and diagrams

4. Invocation
4.1. pdr
4.1.1. Arguments
4.1.2. Options
4.1.3. Examples
4.2. pdx
4.2.1. Options
4.2.2. Examples

5. Configuration
5.1. General options
5.2. Database options
5.2.1. SQLite
5.2.2. MySQL
5.3. Input options
5.3.1. Configuration of a POP3-mailbox
5.3.2. Configuration of a text file
5.3.3. Configuration of a CSV file
5.3.4. Configuration of a XML file
5.4. Output options
5.4.1. Configuration of a report
5.4.2. Configuration of a diagram


1. Introduction and basics

People work on computers, oftenly several hours a day. Additionally almost everyone has a mobile phone in his computerless time, a Palmtop or a similar mobile device. Personal data which accrue over the day could easily be evaluated if all these ways would be made usable. The user should be able to choose between the things that are at his disposal. These things could be very different depending on his location, the day and daytime: on a PC he can enter his data directly with pdr or send himself an e-mail, perhaps using a command line tool like sendmail or from inside an office application. With his mobile phone he can also send e-mails or SMS. Maybe he uses measurement devices collecting data in a private memory - later he can transmit them over USB, Bluetooth, Infrared or something else onto a computer. And perhaps he has already a software producing data in a usable XML format. All these ways must be equivalent and open.

The initial situation is based on the following assumptions:
  1. We have at least one convenient medium for getting personal data on a computer and the effort to use it is acceptable.
  2. Data input and data evaluation do not happen at the same time, especially not in real time. We get data (possibly much) more frequently than they have to be evaluated.
  3. That's why data input must be fast, easy and mobile. This is the most important criterion for acceptance.
  4. For data evaluation the time need is much less critical. There we have criteria like capability, effectiveness and configurability.
  5. Data evaluation means the creation of static reports and diagrams. There's no need for interactive work on the data.
Background: The initial idea was to log individual medical data (blood sugar, blood pressure, body temperature, heart rate, weight and also medication). Especially diabetics taking Insulin measure and collect a lot of such data every day, and it's very interesting for them (and for physicians and specialists) to track and evaluate and comment them.

The applications are not specialized on medical use cases. You can use them also for technical, sports, weather, environmental or financial data, for example for jogging distances and times or for the fuel consumption of your car, the driven distances and the cost's. All what you need is a continuous flow of numeric data.


2. Glossar

2.1. collections

The database is the connective link between pdr and pdx. Normally the user has no need to bother about it's internal structure. Utilizing pdr and pdx he works almost exclusively with so called collections (series of measurements). This is the concept: A collection saves all values of a concrete series of measuring, each value together with a unique timestamp:

[...]
2008-12-17 21:45:00    5.9
2008-12-18 05:00:00    6.1
2008-12-18 12:45:00    5.3
2008-12-18 18:45:00    5.3
2008-12-18 21:45:00    4.7
2008-12-19 05:00:00    5.2
2008-12-19 12:45:00    5.4
2008-12-19 18:45:00    4.7
2008-12-19 21:45:00    5.7
[...]

If there are five parameters to get measured there are also five collections needed. The user can with pdr list, create and delete such collections at any time.

Every collection has a unique name. This name is a combination from the following characters:

A...Z   a...z   _ * + ! ? ^ ° § $ / &  [ ] { } = ~

The name is case sensitive. The number of the collections and the length of their names are unlimited.

Note: Because of the use of these names in expressions what happens quite often the names of collections should be short. There's no argument against the use of single characters (especially letters).

Two collections have fix names: * and #. The first one is the so called default collection which is always numeric. The second one is the comment collection which is text. These two collections don't have to be created explicitly, they always do exist. The reason for this is their special (nameless) use in expressions. You should use both these collections for the most important use case.

Collections have each a concrete type for all of their data values. This type has to be declared during the creation of a collection. Mixed collections are not thinkable at all. There are three possible types of collections:

2.2. rejections

If a database input doesn't comply with conditions, perhaps something is misspelled or a date is invalid, this input is rejected. This means it's data don't get into the valid data pool, not into collections. However, they get into an own table and are not lost.

The idea is that these data can later be corrected interactively. This is important especially for the input per e-mail because e-mail mesages can't be corrected on the mail server and it also doesn't make sense to leave them on the server. At the moment only e-mail input will be rejected if needed, the other data sources can be corrected as they are and do not need to be handled in this way.

For handling rejections pdr has two special command line options.

2.3. expressions

During data input over e-mail mailboxes, the command line or text files pdr interpretes so called expressions. Every text line is an expression. An expression can contain several values, so we have to declare which value should get into which collection. For doing this we use a simple syntax - the name of the collection is used as suffix:

[date] [time] (value[collection])* [; comment]

This definition means:
Note: if there are two values for the same collection in the same expression only the latter will be used, there can be only one value per timestamp in a collection because the timestamp is the unique key.

Date and time have a concrete, not localized syntax:

[CCYY-]MM-DD   and   hh:mm[:ss]

Examples

Given that we have the following collections in the database: l, m, n (all numeric) and anyway * and #. The following expressions would be correct:

5.2                    (implicit use of default collection)
5.2*                  
(explicit use of default collection)
5.2 8l 7n 1m
2009-08-16 12:34 5.3 9n ; this is my comment
23:45 15l
; comment only

We see that simple data input is a primary design goal even if these expressions seem to be a bit cryptic on the first view. They (look at the first three lines) can easily be entered also with the limited capabilities of mobile phones. They have to be read again only by a machine. You can also, if there's no other opportunity to transmit, put these data into a text file or write them on a sheet of paper and enter them later.

2.4. selections

A selection is a part of a collection. This term is specific for pdx and it's evaluations and calculations. You get a selection by invoking the function select or by a calculation returning a selection. The limitation of the collection in relation to it's collection is always based on time because a collection has only one dimension: time. The values of a selection must not be continuous in time, a selection can contain gaps. For example you can select the values of a collection for every day of a month but only those between 8 and 9 o'clock.


3. Working principles

3.1. pdr

3.1.1. Functionality

pdr collects data from several data sources and puts them into collections of the database:

mobile phone -> e-mail mailbox
                      ...      \

              measuring device -+-> pdr -> database
                      ...      /
                      XML file

The database is the only link between pdr and pdx.

3.1.2. Data sources and transactions

At the moment there are the following types of data sources available:

these three data sources work with expressions
these data sources work with
specific data formats in files
Most of the data sources (inputs) have to be configured. During the invocation of pdr the data sources will then be requested in the configured order in the assumption that they have unprocessed data.

pdr uses transactions to guarantee data integrity in the database as far as possible. These transactions last from the invocation of the program (this means the acceptance of the parameters) until the insertion of values into the database. We truely want to exclude the case that data of a data source are partly inserted into the database (and partly not). If a failure occurs during processing the data should be corrected outside the database (this means on the data source) and the processing can be started again.

Note: E-mail data are different (see there).

Configured data sources are processed each in an own transaction.

Data sources specified on the command line will have each an own transaction only if they are files. Expressions specified on the command line are summed up and will be processed all in one transaction.

3.1.3. Input per command line

The simplest (and most uncomfortablest) way to get data into the system is the pdr command line, this means the invocation of pdr. There's nothing needed to be configured for this.

pdr has the command line option -e (--expression) which allows to specify an expression. This option can be multiply used. Moreover all characters behind pdr that are not part or argument of a command line option are summed up to one big expression and processed at once (see there).

If an expression on the command line doesn't have a timestamp the current date and time will be used.

If there's a failure during processing because of any incorrectness in an expression pdr produces a message. A data transfer into the rejections doesn't take place.

3.1.4. Input per mail (POP3)

For the use of e-mail mailboxes we assume that data (mails) have been arrived in the mailbox and that they are not processed by any other application. These mails must have the following properties:
  1. a unique subject
  2. an exploitable timestamp (normally the SMTP server adds one during sending)
  3. plain ASCII text format (no HTML, RTF ...)
  4. text completely in expressions
If there's an e-mail data source configured the mail server will be requestet during the next invocation. pdr looks if there are mails on the server, checks their subject and processes matching e-mails one by one, line by line, each line is an expression. If a line has a timestamp this one has priority. Otherwise the timestamp of the e-mail is valid implicitly. This is very handy because normally you will never have to enter a timestamp manually in usual, single line e-mails.

Here's a complete e-mail source:

From: Torsten Mueller <Mymail@gmx.net>
To: MyMail@gmx.net
Subject: Q
Date: Thu, 04 Feb 2010 17:56:11 +0100
Message-ID: <87pr4ley8k.fsf@castor.ch>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

5.3 8i

Normally most of the values in the header lines are taken from lists. Date and Message-ID are added by the server, MIME-Version and Content-Type come from the e-mail client application. The only remaining text parts that have really to be entered are the subject (that's why it should be short, a single letter here) and the contents of the message, the data line.

Processed e-mails are deleted from the server regardless of the success. So they never get processed a second time. This deletion can be suppressed by configuration.

If there's a failure during processing because of any incorrectness in an expression pdr transfers these expressions into the rejections and writes out a message.

3.1.5. Input per text file

If we use a text file for data input every line counts as expression. This method is practical if you get data in a period without any opportunity to transmit them online. So you have to collect them in a file manually, expression by expression.

Lines starting with # are not processed.

If there's a failure during processing because of any incorrectness in an expression pdr produces a message. A data transfer into the rejections doesn't take place.

Text files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.

3.1.6. Input per CSV file

The abbreviation CSV means "comma separated values". Instead of the comma pdr also accepts the semicolon and the tabulator as separator between the values.

There are two different ways to tell pdr what comma separated data value should get into which collection:
  1. a control line in the CSV file preceding the data lines
  2. a control line in the configuration file, valid for the entire CSV file
In the first case a pdr CSV file would have the following structure:

control line
data line1
[...]
data lineN

control line
data line1
[...]
data lineN

[...]

This kind of use of control lines is unusually but gives us the wanted flexibility and openness. Normally you can insert them easily by hand or by a program like sed. In the second case the CSV file would contain only data lines as expected.

A control line has the following structure:

[# pdr] datetime [separator collection]+

Example:

# pdr datetime, *, n, l; h; q»p, #            (» means a tabulator)

This is a control line for data lines with a timestamp and seven values for the collections *, n, l, h, q, p and #.

Each control line in a CSV file will be known on it's prefix # pdr, a control line in a configuration file doesn't need this prefix. The following keyword datetime marks the position of the timestamp on the data lines. It doesn't have to be on the beginning but every line must have one - there are no data values without a timestamp. In the example we can see that we can have several separators on one data line. Data lines according to this control line whould look like this:

2008-10-11 12:31:38, 5.2, 7, 8; 42.3; 12»96, first measuring
2008-10-12 12:48:08, 6.1,  , 8; 53.1; 16
»93,
2008-10-13 12:43:57, 5.8, 7, 7; 34.2; 15
»94, third measuring

The second line has no values for the collections n and #. In the case of missing values just no inserts are made.

If you have CSV files containing more values than you want to import into collections you can declare omissions in the control line:

# pdr datetime, a, b, , , , c, d, e

Here we read a timestamp and two collections, then we omit three values on the data lines and read again three values.

Lines starting with # are not processed.

During the processing of a CSV file the whole file is handled in a single transaction. If there's a failure because for instance a data value on a line doesn't match the type of the declared collection the whole file is dismissed. A data transfer into the rejections doesn't take place.

CSV files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.

3.1.7. Input per XML file

pdr can read XML files for data input. These files are well formed, read- and editable, and are the ideal thing for data exchange between different software systems. pdr defines an own, intentional very simple format. But the responsible part of the program is designed to be extended for further XML formats.
3.1.7.1. The pdr XML format
The pdr XML format is completely documented in the file pdr.xsd:

<?xml version="1.0" encoding="iso-8859-1" ?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

  <xsd:annotation>
    <xsd:documentation xml:lang="en">
     pdr XML input file definition (C) T.M. Bremgarten 2010-01-31
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="pdr">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="collection" type="collection" minOccurs="0" maxOccurs="unbounded" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <xsd:complexType name="collection">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
           <xsd:attribute name="datetime" type="xs:string" />
           <xsd:attribute name="value" type="xs:string" />
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
    <xsd:attribute name="name" type="xs:string" />
  </xsd:complexType>

</xsd:schema>

This definition allows files that look like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<pdr>
    <collection name="#" type="text">
        <item datetime="2001-07-09 18:27:11" value="first measuring"/>
        <item date
time="2001-07-10 07:52:01" value="second measuring"/>
        <item date
time="2001-07-10 10:07:00" value="third measuring"/>
        [...]
    </collection>
    <collection name="*" type="numeric">
        <item date
time="2001-07-12 13:57:01" value="9.3"/>
        <item date
time="2001-07-12 14:46:45" value="5.6"/>
        <item date
time="2001-07-12 18:25:36" value="5.7"/>
        [...]
    </collection>
    <collection name="l" type="numeric">
        <item date
time="2001-07-03 21:41:58" value="7"/>
        <item date
time="2001-07-04 21:48:43" value="8"/>
        <item date
time="2001-07-05 21:50:49" value="7"/>
        [...]
    </collection>
</pdr>

This format is self explaining. The data of the collections are specified directly and well readable.

During the processing of a XML file the whole file is handled in a single transaction. If there's a failure because for instance a data value doesn't match the type of a collection the whole file is dismissed. A data transfer into the rejections doesn't take place.

XML files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.
3.1.7.2 (more XML formats)
...

3.1.8. Interactive Mode

pdr has an interactive mode which allows to edit, add and delete data values in the database. This can be very useful to correct errors, otherwise also to add subsequent comments and willbe much easier and more comfortable than to execute SQL statements in the database manually.

The interactive mode is started by the command line parameter -i:

$ pdr -i
pdr 0.3.7, interactive mode (press ? for help)
2010-03-03 12:38:01 6.2 1m 6n                       >

The symbol on the right is a prompt. pdr waits now for input. On the left side of the line the last and youngest data row is shown the user has created before. Using navigation commands you can now scroll your data row by row and execute data manipulations. pdr's interacive mode has the following commands:

-
jump to the previous (older) data row

+
jump to the next (younger) data row

^
jump to the first (oldest) data row

$
jumpto the last (youngest) data row

RET
repeat the last navigation command

D[collection+]
delete one or more parts of the data row, these parts a noted as collection-names separated by spaces, when there are no collection-names specified at all the whole data row is deleted, examples:
D
D a b c

expression
supplement the current data row by the specified expression, if the data row has already data values in collections specified in the expression those will be actualized, this means overwritten, otherwise they are added, the timestamp of the data row cannot be changed

TAB
take the current data row as expression for input, this is usefull to correct longer comment without extering the whole expression again
?
show the help screen

q
terminate the interactive mode and pdr

Example session:

$ pdr -i
pdr 0.3.7, interactive mode (press q to quit)
2010-03-03 12:38:01 6.2 1m 6n                       > -
2010-03-03 08:13:32 5.7 1m 6n 7l                    > Dm
2010-03-03 08:13:32 5.7 6n 7l                       > ; Kommentar
2010-03-03 08:13:32 5.7 6n 7l ; Kommentar           > D ; l n
2010-03-03 08:13:32 5.7                             > D
2010-03-02 22:10:56 5.8 10l                         > 12l 80.3k
2010-03-02 22:10:56 5.8 12l 80.3k                   > +
2010-03-03 12:38:01 6.2 1m 6n                       > q
$

3.1.9 XML-Export

pdr has in form of it's command line parameter -X a feature to export the contents of it's database into a XML file. The format of this XML file is the one of section 3.1.7.1. So the user has a tool for data exchange between different databases. A limitation fo the data before export is not planned. The XML file can later easily been edited.

3.2. pdx

3.2.1. Functionality

pdx evaluates data from collections and creates reports and diagrams by using statistic functions. Reports are created from report templates containing place holders which later are replaced by pdx. So it is possible to create reports in almost every (text-)format, for instance ASCII, XML, HTML, RTF, CSV, SQL and so on. With a little more effort even a backend for ODF would be possible. For diagrams at the moment only SVG will be supported. pdx works like this:

     report templates              reports (HTML, XML, TXT)
                      \          /
             Database -+-> pdx -+
                      /          \
  diagram definitions              diagrams (SVG, PNG)

The outputs must be configured. The relevant groundwork for operating pdx is the development of the report templates and the diagram definitions. We need a bit of theory for this ...

3.2.2. Built-in functions

pdx has an extensive set of built-in functions for selecting data from the database and for their statistic evaluation. So the work gets programmable in a wide range. The syntax is very similar to the functional programming language Lisp.

Note: It's not a real Lisp-Interpreter. Especially there's no capability to define new functions at all. But the processing of the functions is as in Lisp strictly functional. The reason for this Lisp-like syntax lies in the genial simple structure of these notation which is immediately understandable just by viewing without learning complicated syntactical constructions.

Note: A list of all built-in functions can be seen in interactive mode with the command ?. Most of these functions can also be tested in interactive mode.

The following sections document all built-in functions with a unified notation:

( function_name function_parameter_type* ) -> result_type

This line means:
  1. every function prototype starts and ends with a round bracket
  2. every function has a name
  3. after the name follow 0 or more parameter types
  4. every function has also a result from a given type
The function name must not be unique, it can be overloaded. Uniqueness must then be realized by the given parameters (number and types).

Some functions have an open list of parameters: ... (ellipse). This means that neither the number nor the types of the parameters are known by definition. These functions can be called with any parameters.

All functions are strictly(iest) typed. The superlative means that we even don't have type conversions. If you read {int} it really means {int} and nothing else, a {double}-value leads to an error. Types are written in curly brackets to distinct them from values. We have types in function definitions. In function calls they are replaced by values. The following types are possible:

{int}, {double} signed numbers
5, 3.14
{string} unlimited character strings
"Hugo"
{time} mostly a timeduration, seldomly really a time 09:13
{timestamp} a concrete point in history with date and time 2009-12-31 7:30:01
{selection} a set of timestamp-value-pairs
{color} a RGB-color in hexadecimal notation
#00FF00
{nothing} only for function result types: the result of
the function is empty and can not be evaluated


3.2.2.1. Date and time functions
Whereever a value of type {time} is needed one of the following functions can be used:

(year)          ->  {time}
(year    {int}) ->  {time}
(years   {int}) ->  {time}
(month)         ->  {time}
(month   {int}) ->  {time}
(months  {int}) ->  {time}
(week)          ->  {time}
(week    {int}) ->  {time}
(weeks   {int}) ->  {time}
(day)           ->  {time}
(day     {int}) ->  {time}
(days    {int}) ->  {time}
(hour)          ->  {time}
(hour    {int}) ->  {time}
(hours   {int}) ->  {time}
(minute)        ->  {time}
(minute  {int}) ->  {time}
(minutes {int}) ->  {time}
(second)        ->  {time}
(second  {int}) ->  {time}
(seconds {int}) ->  {time}

These functions compute a timeduration with a wellknown length. The names are self explaining. The {int}-argument is a factor, this means a number of such time units. Is there's no factor specified the factor is 1.

Note: A year and a month are problematic because they don't have fixed lengths in reality. The pdx specifications (year) and (month) are per definitionem identically with (days 365) and (days 30).

The function now returns the current date and the current time:

(now)  ->  {timestamp}

Note: if you use the command line option -f the returned value is the time of the start of the application even if this lies now some seconds back. So all calls to now return the same timestamp if you use -f.

Note: you can define the returned value with the command line option -n. With this you can (if you have well designed report templates and diagram definitions) produce reports for any time in history without any complicated configuration.

Examples see next section.

Using the functions + and - you can compute relative timestamps. The functions have the following signatures:

(+ {timestamp} {time})   ->   {timestamp}
(- {timestamp} {time})   ->   {timestamp}

Mostly you will use a call to now for the first parameter to get a timestamp which is a week or a month in the past. These functions intended to be used with folding.
3.2.2.2. Functions for data selection
The following functions retrieve a part of exactly one collection and return this as {selection}. The {string}-parameter names the collection, the other parameters limit the result by time:

(select {string})                          ->  {selection}
(select {string} {timestamp})              ->  {selection}
(select {string} {timestamp} {timestamp})  ->  {selection}
(select {string} {time})                   ->  {selection}
(select {string} {time} {timestamp})       ->  {selection}

The first implementation simply gets all data of the given collection. This can take some time. The second one gets all data from the specified timestamp on, the third one all data between the two timestamps. The second timestamp, the end, doesn't belong to the result. The fourth implementation gets all data in the specified timeduration until now, this means what a call of now would return. The fifths gets all data in the specified timeduration before the specified timestamp.

Examples:

(select "*")
get all data of the default collection

(select "*" 2009-12-01-12:34)
get data of the default collection since Dec 01 2009 12:34, see Note below!

(select "n" 2009-01-01-0:00 2010-01-01-0:00)
get all data of the collection n of the year 2009
 
(select "l" (weeks 2))
get all data of the collection l of the last two weeks

(select "l" (months 3) 2009-06-01-0:00)
get all data of the collection l of the last three months before June 01 2009

Note: In some cases we need timestamps as parameter. Normally timestamps are written like CCYY-MM-DD hh:mm[:ss] - with a space in the middle. But the space is here also the separator for function parameters. So we need a - (minus) here to tell pdx that this timestamp is one parameter, not two.

With the help of the function merge you can join selections. The result is a single selection:

(merge keyword ...) -> {selection}

with keyword = avg, min, max, sum, first or last

Instead of the ellipse the function expects {selection} parameters. These selections will be joined according to their timestamps into one single selection. It can happen that two selections have each a value for the same timestamp. In this case keyword plays a role: it names the operation to be executed with both values to get a single new value. avg computes the average of both values, min retrieves the lesser one, max the bigger one, sum adds both values, first retrieves the left one, last the right one. Some operations only make sense on numeric data. All involved selection have to be from the same type. You can merge even three or more selections, too. Example:

selection a                 selection b             (merge avg (select "a") (select "b"))
--------------------        --------------------    -------------------------------------
                            2009-11-17 12:38 9.3   
2009-11-17 12:38 9.3
2009-12-01 13:01 5.2                                2009-12-01 13:01 5.2
2009-12-02 13:02 5.7                               
2009-12-02 13:02 5.7
2009-12-03 13:03 3.2                               
2009-12-03 13:03 3.2
                            2009-12-03 19:17 8.4   
2009-12-03 19:17 8.4
2009-12-04 13:04 4.8                                2009-12-04 13:04 4.8
2009-12-05 13:05 5.7        2009-12-05 13:05 4.7   
2009-12-05 13:05 5.2 <- avg!
2009-12-06 13:06 5.3                                2009-12-06 13:06 5.3

The function fold allows "folding" the time axis of a selection. Imagine a paper strip and fold it in your mind so that periods of time lay above each other. With this technique you can compare days or months. The function fold has the following prototype:

(fold keyword1 keyword2 {selection}) -> {selection}

with keyword1 = year, month, day, hour or minute
and keyword2 = avg, min, max, sum, first or last

keyword1 names the interval the folding is based upon. If you specify day for folding all parts of the timestamps of the selection up to the day are removed, this means only the time is remaining. All values lay than on a 24 hour time axis. Example:

selection a                 (fold day avg (select "a"))
--------------------        ---------------------------
2009-12-01 13:01 5.2        9999-01-01 13:01 5.45 <- avg!
2009-12-02 13:02 5.7        9999
-01-01 13:02 5.7
2009-12-03 13:03 3.2       
9999-01-01 13:03 3.2
2009-12-04 13:04 4.8        9999-01-01 13:04 4.8
2009-12-05 13:01 5.7

2009-12-06 13:06 5.3        9999-01-01 13:06 5.3

The meaning of this function is for example to get a daily average but based upon several days having much more values.

Note: even the selection in the result of a fold-operation must have valid timestamps. But there's no possibility to get an absolute timestamp after folding a period of time, several timestamps lay above each other. That's why these timestamps get the year 9999. According to the interval used also other parts of these timestamps (month, day and so on) are surely valid in their syntax but do not make sense.
3.2.2.3. Statistic functions
All functions of this section do statistic calculations on a selection and return again a selection with a calculated value on each line. For example we can calculate the daily average over a month.

The scheme of the parameters is identically in all statistic functions:
The folowing functions calculate the arithmetic average:

(avg {selection})                          ->  {selection}   (a)
(avg {selection} {time} {time})            ->  {selection}   (b)
(avg {selection} keyword)                  ->  {selection}   (c)
(avg {selection} keyword {time})           ->  {selection}   (d)
(avg {selection} keyword {time} {time})    ->  {selection}   (e)
(avg {selection} {int} {int})              ->  {selection}   (f)

Standard deviation:

(sdv {selection})                          ->  {selection}
(sdv {selection} {time} {time})            ->  {selection}
(sdv {selection} keyword)                  ->  {selection}
(sdv {selection} keyword {time})           ->  {selection}
(sdv {selection} keyword {time} {time})    ->  {selection}
(sdv {selection} {int} {int})              ->  {selection}

Count:

(count {selection})                        ->  {selection}
(count {selection} {time} {time})          ->  {selection}
(count {selection} keyword)                ->  {selection}
(count {selection} keyword {time})         ->  {selection}
(count {selection} keyword {time} {time})  ->  {selection}

The count-function does always return a line with a {double}-value regardless from the type of the selection.

Arithmetic maximum and minimum:

(max {selection})                          ->  {selection}
(max {selection} {time} {time})            ->  {selection}
(max {selection} keyword)                  ->  {selection}
(max {selection} keyword {time})           ->  {selection}
(max {selection} keyword {time} {time})    ->  {selection}

(min {selection})                          ->  {selection}

(min {selection} {time} {time})            ->  {selection}
(min {selection} keyword)                  ->  {selection}
(min {selection} keyword {time})           ->  {selection}
(min {selection} keyword {time} {time})    ->  {selection}

The functions max and min return a value which is maximum or minimum in the selection and it's original timestamp.

Sum:

(sum {selection})                          ->  {selection}
(sum {selection} {time} {time})            ->  {selection}
(sum {selection} keyword)                  ->  {selection}
(sum {selection} keyword {time})           ->  {selection}
(sum {selection} keyword {time} {time})    ->  {selection}

The first or last, this means the oldest or youngest line of a selection:

(first {selection})                        ->  {selection}
(first {selection} {time} {time})          ->  {selection}
(first {selection} keyword)                ->  {selection}
(first {selection} keyword {time})         ->  {selection}
(first {selection} keyword {time} {time})  ->  {selection}

(last {selection})                         ->  {selection}
(last {selection} {time} {time})           ->  {selection}
(last {selection} keyword)                 ->  {selection}
(last {selection} keyword {time})          ->  {selection}
(last {selection} keyword {time} {time})   ->  {selection}

The functions first and last return a value which is the oldest or the youngest in the selection and it's original timestamp.

Note: most of these functions are only allowed on selections with numeric data. count, first and last will always work. sum works also on selections with text, for instance for comments. These strings will then be concatenated. avg also works on selections containing ratio-values. In this case the calculation of the average is done separated for numerator and denominator.

Examples:

(avg (select "*"))
compute the average over all values of the default collection

(max (select "*") day 2:00)
get the daily maximum of the default collection, assume day change at 2:00

(sum (select "n" 3:30 9:00) day)
get the daily sum of values of the collection n, sum only values between 3:30 and 9:00

(avg (select "l" (month)) 5 5)
get the floating average over 11 values of the collection l for the last month

(first (select "*" (month)) day 2:00)
get the first line of each day of the last monthfrom the default collection

(last (select "*" (day)) hour)
get the last line of each hour of the last day from the default collection
3.2.2.4. Arithmetic functions
pdx has a small set of arithmetic functions namly the four basic arithmetic operations. They exist each in three different implementions:

(X {double}    {double})    -> {selection}     with X = +, -, * or /
(X {selection} {double})    -> {selection}
(X {selection} {selection}) -> {selection}

The first one applies the operation simply on both numeric operands. The result is a single line. The second implemention applies the operation on each line of the {selection} and the {double}-value. The result has the same number of lines as the selection and the same timestamps. These implementations are intended to be used especially for unit conversions.

The third implementation is a bit more complex. It allows the line by line combination of two selections. For this the timestamps are compared. The numbers of lines in both selections must not be equal. If the second selection doesn't have a line with the same timestamp as in the first selection the last older value will be taken. The result has as many lines as the first selection:

selection a                 selection b                                    (* (select "a") (select "b"))
--------------------        --------------------                           -----------------------------
                            2009-11-17 12:38 9.3

2009-12-01 13:00 5.2                                ->   5.2 * 9.3 =       2009-12-01 13:00 48.36
2009-12-02 13:00 5.7                                ->   5.7 * 9.3 =      
2009-12-02 13:00 53.01
2009-12-03 13:00 3.2                                ->   3.2 * 9.3 =      
2009-12-03 13:00 18.24
                            2009-12-03 19:17 8.4

2009-12-04 13:00 4.8                                ->   4.8 * 8.4 =       2009-12-04 13:00 40.32
2009-12-05 13:00 5.7        2009-12-05 13:00 4.7    ->   5.7 * 4.7 =      
2009-12-05 13:00 26.79
2009-12-06 13:00 5.3                                ->   5.3 * 4.7 =       2009-12-06 13:00 30.21

The timestamps of the result are those from the first {selection}-parameter. As you can see we need a valid line in the second selection also for the first line in the first selection. pdx reports an error if this condition is not fulfilled. - This implemention is very useful if you have two collections that are numerator and denominator of a fraction, for instance specific fuel consumption per distance.
3.2.2.5. Functions for reports
The functions of this section are necessary for the creation of reports. They return a string, oftenly a large block of text. pdx reads the report template, finds a call of the format-function, evaluates this immediately and replaces this with the functions's result at the same position in the text. These functions can be tested in interactive mode, too.

The format-function is complex even though it's prototype looks simple:

(format ...)      ->  {string}

The function expects just a list of parameters consisting of text, function results, format specifications and keywords. The result is a piece of text with one or more lines. The best way to understand this very central function is to take a look at examples.

Example 1:

(format
    (avg (select "*" (days  7)))     <1.2>
)

This call creates a single, formatted value. The first argument of the format-function is here the result of a statistic calculation (avg-function) which returns a single-line selection with a numeric value on it. After this follows a format specification in angle brackets which means: the result should have at least one digit before and always two digits after the decimal point.

It's much more complex if there are multiple selections perhaps with multiple lines and different numbers of lines. But now we learn to know the true thickness of the format-function.

Example 2:

(format
    "<tr>"
    "<td>"   datetime                          "</td>"
    "<td>"   (select "*" (days 7))    <1.1>    "</td>"
    "<td>"   (select "n" (days 7))    <1>      "</td>"
    "<td>"   (select "l" (days 7))    <1>      "</td>"
    "<td>"   (select "m" (days 7))    <1.0>    "</td>"
    "<td>"   (select "x" (days 7))    <1.1>    "</td>"
    "<td>"   (select "#" (days 7))             "</td>"
    "</tr>"
    newline

)

This call creates HTML-rows for a HTML-table containing data from the collections *, n, l, m, x and # of the last seven days. (The table definition is not part of the function call.)

What you see at first sight is:
At first the format-function analyses the selections and joins the hidden table. Then the table is formatted line by line, value by value.

Note: for those constructs which can look even much more complicated it's worth to write this neat and clean and to use spaces to arrange the things that one can understand what's going on.

The result of such a functions call is real HTML and looks like this:

[...]
<tr><td>2009-01-17 18:58:13</td><td></td><td>6</td><td></td><td></td><td></td><td></td></tr>

<tr><td>2009-01-17 21:42:49</td><td>5.6</td><td></td><td>16</td><td></td><td></td><td></td></tr>
<tr><td>2009-01-18 05:54:41</td><td>6.8</td><td>7</td><td>8</td><td>1</td><td></td><td></td></tr>
<tr><td>2009-01-18 12:17:22</td><td>5.4</td><td>6</td><td></td><td>1</td><td></td><td></td></tr>
[...]

The number of the created lines depends on the selections alone. The bold values come from the selections, everything else from the string parameters of the format-function. If the hidden table has an empty value on one line the format-function does not create any output. This leads to an empty field in the table: <td></td> which is by the way oftenly not very well interpreted by many browsers. It would be better to use <td><br></td> for empty fields.

So occasionaly we have the problem to print something visible for empty values. Using the empty-function

(empty {string})  ->  {string}

one can tell the format-function what string to print instead of nothing.

Example:

(format
    (empty "nil")
    [...]
)

This will print nil for every empty value.

The following small and parameterless functions are very simple:

(build)           ->  {string}
(version)         ->  {string}
(database)        ->  {string}

build
returns a string containing pdx-build-informations, while version returns the actual version of pdx. These values can be included into reports to show what version of pdx did create the report. database shows the version of the current database.
3.2.2.6. Functions for diagrams
The functions of this section create diagrams. They don't return anything, they draw a diagram. That's why they cannot be tested in interactive mode.

The diagram-function is a container. All other diagram functions must be called as parameter of the diagram-function only. As expected diagram has an open list of parameters:

(diagram {int} {int} {color} ...)                                  ->  {nothing}

The two {int}-parameters are the wanted width and height of the diagram in pixels.

Note: these numbers do not include the labels on the axes, only their inner area. The resulting diagram is indeed bigger because of the labels. The reason for this is related to the difficult computation of text sizes in SVG graphics.

The third parameter is the background color of the diagram.

Example:

(diagram 400 300 #FDFDFD
    [...]
)

The axes-function draws an entire coordinate system (1st quadrant):

(axes {timestamp} {timestamp} {double} {double} {double} {color})  ->  {nothing}
(axes {time}      {timestamp} {double} {double} {double} {color})  ->  {nothing}
(axes {time}                  {double} {double} {double} {color})  ->  {nothing}
(axes keyword                 {double} {double} {double} {color})  ->  {nothing}

with keyword = year, month, day, hour or minute

The first implemention draws the x-axis from the first to the last timestamp, the second for the specified timeduration before the timestamp, the third for the specified timeduration before now. The forth implementation is specialized for drawing data that are a result of a call to the fold function. You should use the same interval. The axis will the compute everything what's needed. The step width of the labels on the x-axis is calculated internally. The following three {double}-values are lower limit, upper limit and step width on the y-axis. The {color}-parameter names the color of the axes and the labels.

Example:

(axes 2009-08-01-0:00 2009-09-01-0:00   2.0 10.0 1.0   #000000)
(axes (months 3)                        7.0 15.0 0.5   #101010)

Using the hline-function one can insert very handy horizontal lines into the diagram:

(hline {double}          {color})                                  ->  {nothing}
(hline {double} {double} {color})                                  ->  {nothing}

The first {double}-parameter is the position of the line on the y-axis, the optional second one is the thickness of this line. The color of the line is the third, the {color}-parameter.

Example:

(hline 7.0 0.25 #101010)

You can use the vline-function to dram vertical lines. This function exists in four different versions:

(vline {timestamp}          {color})
(vline {timestamp} {double} {color})
(vline {time}               {color})
(vline {time}      {double} {color})

The first two implementations use an absolute timestamp, the last two a relative and returning time for folded time axes. The {double}-parameter is the thickness, the {color}-parameter the color of the line.

Example:

(vline 5:45 0.25 #101010)           folded time axis!

The last and most important diagram function is the curve-function. This function draws a curve in different styles. This curve is always based upon a selection and has a color:

(curve {selection} {color} ...)                                    ->  {nothing}

Without any further parameters the curve-function draws a zigzag line in the specified color just by connecting the data values of the selection. With the help of additional parameters this behaviour can be changed:
In bar graphs one can draw multiple bars in one aggregation interval using two {int}-parameters. This sounds difficult but is easy to understand in an example: Given that we have values for four different, abstract day times, say "in the morning", "at noon", "in the evening" and "late". And we want four bars per day representing values at these times. In this case it would be necessary to draw the bars that they don't overlay each other:

(curve   (sum (select "n" (month 1)) day  3:30  9:30)    #FF1000    bars 1 4)
(curve   (sum (select "n" (month 1)) day 11:00 14:30)    #FF5000    bars 2 4)
(curve   (sum (select "n" (month 1)) day 17:30 20:30)    #FF9000    bars 3 4)
(curve   (sum (select "n" (month 1)) day 21:00  2:00)    #FFB000    bars 4 4)

These four lines differ in the selections, the colors of the bars and in the first {int}-parameter. This one is the number of the bar, the second {int}-parameter says how many bars we have. So the first line draws the first bar of four. pdx computes how wide a single bar must be drawn. In the example every bar gets a quarter of the width of a day on the x-axis. You can play with this. You must not draw every bar, this means you can also create gaps between bars with this.
3.2.2.7. Other functions
pdx has exactly one function that is specific for diabetics: the HbA1c-function:

(HbA1c {string})                          ->  {selection}
(HbA1c {string} {timestamp})              ->  {selection}
(HbA1c {string} {timestamp} {timestamp})  ->  {selection}
(HbA1c {string} {time})                   ->  {selection}
(HbA1c {string} {time} {timestamp})       ->  {selection}

The function computes a HbA1c-value in percent by a very simple approximation. Therefor you need blood sugar values for at least three months, this means if you want to draw a curve over a month you need data from at least the last four months. The computed value is definitivly not very exact but you can see the fluctuations in the curve. The parameters are similar to the select-function. The first implementation computes the value of today, the second the value at the specified timestamp.

The HbA1c-functions exist each in a second implemention:

(HbA1c2 {string})                          ->  {selection}
(HbA1c2 {string} {timestamp})              ->  {selection}
(HbA1c2 {string} {timestamp} {timestamp})  ->  {selection}
(HbA1c2 {string} {time})                   ->  {selection}
(HbA1c2 {string} {time} {timestamp})       ->  {selection}

These implementions give younger values a higher importance than older ones assuming that this is a better approach to the natural habit. The curve wobbles a bit more than the one of the above implementions.

3.2.3. Interactive mode

pdx has an interactive mode. This mode is very useful for testing function calls before you put them into report templates or diagram definitions. And you can execute short queries, for instance How many values are there in collection x? or What was the all time maximum? The interactive mode is startet by the command line option -i. pdx shows a prompt and waits for input:

$ pdx -i
pdx 0.3.1 (2010-01-03 16:43:29 on castor, GNU/Linux 2.6.32-ARCH x86_64)
>

At this prompt there are two instructions, ? and q, every other input will be interpreted as function call.

The instruction ? lists implementions of built-in functions. Without any parameters ? shows all known built-in functions with their parameter types and their return type. ? accepts a regular expression which can be used to shrink the result:

> ?min    show all implementions of the min-function
> ?a.*    show all functions beginning with a
> ?m..    show functions beginning with m and having two more characters

The instruction q terminates the interactive mode and also pdx. This can also be achieved by Ctrl-D or Ctrl-C.

The call of a function shows the result immediately. A call like

> (select "*")

shows all actual values of the default collection.

3.2.4. All together: the creation of reports and diagrams

Reports come into existence from report templates. pdx searches through these templates to find a section with pdx function calls, mostly the format-function. Such sections are then cut out of the template, being evaluated and then replaced by the result. A report template can have multiple sections with function calls.

Report templates are eather plain ASCII text or text in a formatting language like HTML or XML or text in a programming language like C or SQL. We call this a host language. What host language is being used we don't want to limit in any way. But pdx must know how to find the sections with the function calls. That's why these sections are placed in comments of the host language, for example in HTML or XML between <!-- and -->, in C between /* and */. Doing so the template remains still an incomplete but correct file of it's type which still allows the use of tools like HTML- or XML-editors. It's wise to "mark" such pdx-comments still a bit more to distinct them from other existing comments in the file. You could use indications like <!--- and ---> or /** and **/. A complete, small template for a HTML template file containing pdx-instructions would look like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html lang="de-ch">
<head>
    <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
    <title>MyTitle</title>
</head>

<body style="direction: ltr;" lang="de-DE">

<!--- (now) --->, <b>pdx</b> <!--- (version) ---> (<!--- (build) --->)                          *

</body>
</html>

In the line marked with * we see three small pdx-sections each one with a call to one of the functions now, version and build. The output this line creates looks like this:

2009-12-27 15:14:51, pdx 0.3.0 (2009-12-27 10:28:14 on castor, GNU/Linux 2.6.31-ARCH x86_64)

One can see that all parts of the line that are not placed in pdx-sections (and also the whole surrounding text) is transferred without any change into the output. By the way this line could be written also using the format-function:

<!--- (format   (now)   ", <b>pdx</b> "   (version)   " ("   (build)   ")") --->

To fill a complete HTML-table with values we could write a template file like this:

[...]

<table style="page-break-before: avoid; page-break-inside: avoid;
             width: 800px;" border="1" cellpadding="1" cellspacing="1">

  <tbody>
    <tr valign="top">
      <td>Datum/Zeit</td>
      <td>*</td>
      <td>n</td>
      <td>l</td>
      <td>m</td>
      <td>x</td>
      <td>Kommentar</td>
    </tr>
<!---
(format
    (empty "<br>")
    "<tr valign=top>"
    "<td>"    datetime                         "</td>"
    "<td>"    (select "*" (days 7))   <1.1>    "</td>"
    "<td>"    (select "n" (days 7))   <1>      "</td>"
    "<td>"    (select "l" (days 7))   <1>      "</td>"
    "<td>"    (select "m" (days 7))   <1.0>    "</td>"
    "<td>"    (select "x" (days 7))   <1.1>    "</td>"
    "<td>"    (select "#" (days 7))            "</td>"
    "</tr>" newline
)
--->
  </tbody>
</table>

[...]

The table starts outside the pdx-section and also ends outside. But the lines of the table (except the header) are generated completely by pdx. These lines look all equal.

Diagrams are made from diagram definition files. These files always contain exactly one diagram definition, this means exactly one call to the diagram-function. In diagram definition files comment indications are not necessary because we have no host language. A complete diagram definition would look like this:

(diagram 400 300 #FFFDFD

    (axes (month 1) 3.0 9.0 1.0 #0)

    (hline 5.0 #C0C0C0)
    (hline 6.0 #C0C0C0)
    (hline 7.0 #C0C0C0)

    (curve (sum (select "*" (month 1)) day  3:30  9:30)    #FF0000)
    (curve (sum (select "*" (month 1)) day 11:00 14:30)    #00FF00)
    (curve (sum (select "*" (month 1)) day 17:30 20:30)    #0000FF)
    (curve (sum (select "*" (month 1)) day 21:00  2:00)    #FFFF00)
    (curve (avg (select "*" (month 1)) day        2:00)    #0       1.0)
)

While developping new report templates or diagram definitions it would be wise to take existing files as base and to modify them step by step.


4. Invocation

4.1. pdr

pdr accepts options and arguments.

Note: Options can have arguments themselves. Don't mix this.

Options start with a minus character. A second minus indicates long options with readable names.

4.1.1. Arguments

Everything that follows the program name pdr on the command line and does not begin with a minus counts as argument of pdr. All arguments are summed up to one expression and are evaluated as one:

$ pdr 5.2 5n 8l -v \; this is comment up to the end of the line

The resulting expression built from arguments is:

5.2 5n 8l ; this is comment up to the end of the line

-v does not belong to the resulting expression, it's a known option of pdr. The backslash in front of the semicolon is special for Unix-like operating systems. Their shell evaluates the semicolon itself but we use it as comment delimiter. To avoid a conflict we must put a backslash there. This backslash will be removed by the shell and will truely not get into the input of pdr.

4.1.2. Options

-?
show a help screen

-V
show the pdr version

-v
show what is being done, without this option, pdr shows
only errors

-c filename
use filename as configuration file, this option
superseedes the standard configuration file ~/.pdrx

-l
list all known collections in the database and some statistics

-a "name,type"
add a collection to the database, the argument is a string,
containing name and type of the new collection, types
are n, r and t (for numeric, ratio or text)

-d name
delete a collection, the argument is the name of
the collection

-D
delete all collections, the collections * and # are not
deleted but will be cleared completely

-r
list all known rejections

-R
delete all current rejections

-e "expr"
evaluate an expression, the argument should be a
complete expression

-t filename
import a text file into the database

-C filename
import a CSV file into the database

-x filename
import a XML file into the database

-n
do not use any of the configured datasources,
use the command line only
-i
start pdr in interactive mode
-X filename
export the contents of the entire database into a XML file, this file is compatible to the one used by -x

4.1.2. Examples

First the options for handling collections. -l or --list-collections shows all known collections and some statistic data:

$ pdr -l
  name   type     table  recs    first                last
  #      text     C1     160     2008-11-25 18:45:00  2010-01-02 21:55:37
  *      numeric  C0     1636    2008-11-25 05:00:00  2010-01-03 12:10:00
  h      numeric  C6     1       2009-05-19 16:00:00  2009-05-19 16:00:00
  l      numeric  C3     707     2008-11-25 05:00:00  2010-01-03 06:26:01
  m      numeric  C4     612     2009-03-04 05:00:00  2010-01-03 06:26:01
  n      numeric  C2     1275    2008-11-25 05:00:00  2010-01-03 12:10:00
  x      numeric  C5     119     2009-03-22 09:28:09  2010-01-03 10:31:01

This table shows name and type of every collection, the physical SQL table in the database, the number of records and the first and last timestamps.

You can add a collection using -a or --add-collection:

$ pdr -a "k,n"

$ pdr -l
  name   type     table  recs    first                last
  #      text     C1     160     2008-11-25 18:45:00  2010-01-02 21:55:37
  *      numeric  C0     1636    2008-11-25 05:00:00  2010-01-03 12:10:00
  h      numeric  C6     1       2009-05-19 16:00:00  2009-05-19 16:00:00
  k      numeric  C7     0
  l      numeric  C3     707     2008-11-25 05:00:00  2010-01-03 06:26:01
  m      numeric  C4     612     2009-03-04 05:00:00  2010-01-03 06:26:01
  n      numeric  C2     1275    2008-11-25 05:00:00  2010-01-03 12:10:00
  x      numeric  C5     119     2009-03-22 09:28:09  2010-01-03 10:31:01

The argument contains the name of the new collection, a comma and then the type written as n (for numeric), r (for ratio) or t (for text).

A collection that is not needed anymore can be deleted using -d or --delete-collection:

$ pdr -d k

You can delete all collections at once with -D or --delete-all-collections. After this you will still have two remaining, empty collections, * and #:

$ pdr -D

$ pdr -l
  name   type     table  recs    first                last
  #      text     C1     0
  *      numeric  C0     0

There are two options for handling rejections. Using -r or --list-rejections rejected data can be listed:

$ pdr -r
  timestamp            expression
  2010-01-03 17:46:20  12.0k                               (error here: the collection k doesn't exist)

If you have rejections you should check the rejected expressions. If you can "guess" the correct expression, if you can remember, you can enter a correct expression on the command line. After correcting all the rejections you can delete them all at once:

$ pdr -R

The option -e or --expression allows the specification of exactly one expression for input:

$ pdr   -e "5.2"   -e "2009-12-31 17:28:03 7.9"

This option can be used multiply. Then these expressions remain independent.

The option -n or --none allows the abandonment of all configured data sources. This is useful in the case of multiple command line invocations in a short time, for instance for executing expressions. Many mail servers don't want you to login many times in a short time. Oftenly this is limited to a concrete number per minute. Getting a POP3 connection also costs time. Using -n you work very privately on your local database.

4.2. pdx

pdx has no arguments, just options.

Options start with a minus character. A second minus indicates long options with readable names.

4.2.1. Options

-?
show the help screen

-V
show the pdx version

-v
show what's going on, without this option pdx
reports errors only

-c filename
use filename as configuration file, this option
superseedes the standard cofiguration file ~/.pdrx

-n timestamp
define the value returned by the function now

-i
start pdx in interactive mode

-f
run in fast mode

4.2.2. Examples

The option -n or --now allows the specification of the value returned by the function now. Everywhere now is called explicitly or implicitly the new timestamp will be used instead of the default one. This is very handy to create reports and diagrams for a concrete time in history. A pre-condition for this is that the report templates and/or diagram definitions don't use fixed timestamps themselves. Time specifications in the argument are optional. The specification is filled up with zeros - so the following invocations do the same:

$ pdx -n 2009-10-01
$ pdx -n 2009-10-01-00:00
$ pdx -n 2009-10-01-00:00:00

The Option -f or --fast improves the speed of the program (probably a lot). This option causes internal results to be cached and used again. This costs memory but accelerates the program noticeable. You should use -f whenever possible and even put this into the configuration file ~/.pdrx. This should not have any impact on the results.


5. Configuration

pdr and pdx need some configuration because their behaviour can be affected in a wide range. All these settings are placed in a local configuration file with the name ~/.pdrx. This file has four sections:
  1. general options
  2. database options
  3. input options (pdr specific)
  4. output options (pdx specific)
The order of the sections or their rows in the file doesn't matter.

5.1. General options

In this section two things are to be configured:
Basic settings are similar to command line options. It doesn't make sense to use every command line option here but some do indeed:

verbose = true

This line causes that both pdr and pdx are verbose. Otherwise they'll report nothing but errors. Setting verbose = true is recommended.

fast = true

Let pdx always run in fast mode. (also recommended)

If you don't use pdx to create reports and diagrams, just as database front end, you can let it run in interactive mode generally:

interactive = true

The line

encoding = UTF-8

sets the default encoding which is used if we don't have any better specification. This option is responsible for handling text correctly (for instance comments, especially text with german umlauts). On a modern system you will use UTF-8 or ISO-8859-1, depending on what your shell is using. pdr allows ASCII, UTF-8, UTF-16, ISO-8859-1, ISO-8859-15, Windows1252.

The configuration of inputs and outputs is important:

inputs = e-mail-mailbox, file1, file2, file3
outputs = report1, diagram1, diagram2, diagram3, diagram4

The first line defines inputs for pdr, here four data sources, namely e-mail-mailbox, file1, file2 and file3, which are processed one after the other in this order. What these names are is configured later, see input options. The second line names five outputs for pdx in the same manner, namely a report und four diagrams. They are configured later, see output options.

5.2. Database options

Here we define everything related to the database.

5.2.1. SQLite

database.type = sqlite
database.connect = ~/local/share/my_data.db

The first line defines the database to be a SQLite database. The second line contains the complete connection string to the database. In the case of SQLite this is just the name of the database file. Because of the applications are personal applications the database is intended to be placed somewhere in the users home directory. The physical creation of the database is not a task of pdr or pdx. The user should do this using tools of the database or the operating system himself. For SQLite this is simple:

$ cat > my_database.db      (terminate by Ctrl-D)

This command creates a 0 Byte file which can be used as empty database. pdr creates the schema on the first call.

5.2.2. MySQL

database.type = mysql
database.connect = user=my_db_user_name;password=my_db_user_password;db=my_db_name;compress=true;auto-reconnect=true

The first line defines the database to be a MySQL database. The second line contains the complete connection string to the database containing key-value-pairs (the keys are bold). There are two preconditions:
  1. The database must exist, this means it has to be created by a database administrator. He also gives it a name which is unique at the database server, in example pdrx. On servers used by several users it would be wise to create several user specific databases distinctable by name.
  2. The user (a user of the database server, not of the operating system) must exist and must have the right to create, delete, select and manipulate tables.

5.3. Input options

If you want to query the same data sources again and again it will be useful to configure them in the configuration file.

5.3.1. Configuration of a POP3-mailbox

To configure a POP3-mailbox (named e-mail-mailbox here) you need the following settings:

e-mail-mailbox.type     = pop3
e-mail-mailbox.server   = pop.gmx.net
e-mail-mailbox.account  = MyAccount@gmx.net
e-mail-mailbox.password = MyPassword
e-mail-mailbox.subject  = Q
e-mail-
mailbox.keep     = yes

The first line defines that e-mail-mailbox is a POP3 mailbox. The next three lines are self explaining. The fourth line names the e-mail subject which is used by pdr to identify the relevant e-mails on the server. Only mails containing this subject are processed, all others will just be ignored. This way you don't have to allocate a special new mailbox for pdr, you can use an existing one. Note: you must enter this subject in every mail. If you send a lot of data mails you have to enter this very often. You should use a short subject, one letter, but it has to be unique. The last line determines whether a processed e-mail should be deleted or not. The option accepts true, false, yes and no. In the case of true or yes the e-mail is not deleted on the server. If you don't use this option processed mails will be deleted.

5.3.2. Configuration of a text file

If you want to use a text file for input you need the following configuration:

file1.type     = txt
file1.filename = ~/my_file.txt
file1.encoding = ISO-8859-1
file1.keep     = true

The first line defines file1 to be a text file input. The second line names the file, the third one the encoding of the file. If you do not name an encoding here the default encoding from the general options will be used. The last line determines whether a processed text file should be deleted or not. The option accepts true, false, yes and no. In the case of true or yes the file is not deleted. If you don't use this option processed files will be deleted.

filename allows the use of wildcards (* and ?) to process a file with a not completely known or frequently changing name or even an entire group of files. The path must be complete but the file name can include something like *.txt to process all files of a directory at once.

5.3.3. Configuration of a CSV file

If you want to use a CSV file for input you need the following configuration:

file2.type      = csv
file2.filename  = ~/my_file.csv
file2.encoding  = ISO-8859-1
file2.ctrl_line = datetime, x, y, z
file2.keep      = false

The first line defines file2 to be a CSV file input. The second line names the file, the third one the encoding of the file. If you do not name an encoding here the default encoding from the general options will be used. The option ctrl_line specifies if needed a control line for the entire CSV file. Then the CSV file itself doesn't have to contain a control line. The last line determines whether a processed CSV file should be deleted or not. The option accepts true, false, yes and no. In the case of true or yes the file is not deleted. If you don't use this option processed files will be deleted.

filename allows the use of wildcards (* and ?) to process a file with a not completely known or frequently changing name or even an entire group of files. The path must be complete but the file name can include something like *.csv to process all files of a directory at once.

5.3.4. Configuration of a XML file

If you want to use a XML file for input you need the following configuration:

file3.type     = xml
file3.filename = ~/my_file.xml
file3.keep     = no

The first line defines file3 to be a XML file input. The second line names the file. we don't need an encoding here, the XML file has it's own encoding specification. The last line determines whether a processed XML file should be deleted or not. The option accepts true, false, yes and no. In the case of true or yes the file is not deleted. If you don't use this option processed files will be deleted.

filename allows the use of wildcards (* and ?) to process a file with a not completely known or frequently changing name or even an entire group of files. The path must be complete but the file name can include something like *.xml to process all files of a directory at once.

5.4. Output options

5.4.1. Configuration of a report

To configure a report you need the following settings:

report1.type          = report
report1.comment_begin = "<!---"

report1.comment_end   = "--->"
report1.input_file    = input/report1.html
report1.output_file   = output/report1.html
report1.encoding      = ISO-8859-1

The first line defines report1 to be generated report. The next two lines declare the comment indications used by pdx to identify code blocks with function calls in the report template. The fourth and fifth lines name input and output file. The last line names the encoding of the created file, only needed for files that don't have an encoding specification inside.

5.4.2. Configuration of a diagram

The configuration of a diagram is similar to the configuration of a report but simpler:

diagram1.type = diagram
diagram1.input_file=input/diagram1.tmpl
diagram1.output_file=output/diagram1.png
diagram1.antialias=true

We don't have to specify comment indications. The last option antialias is only valid for png-files. If this option is used the Cairo library creates antialiased images which normally look much better if you have zigzag lines.