aboutsummaryrefslogblamecommitdiffstatshomepage
path: root/docs/online_duration.md
blob: 6076d183ebc0e70d05214e2338c329731bc65430 (plain) (tree)
1
2
3
4
5
6
7
8
9







                                                      
                                                   



                                                                        
                                                                                  






                                                                          























































































                                                                               

                                                                               

                                  
                                                                        


           
                                                                                        


           
                                                                                              


              
                                                                                        
































                                                                              
online_duration.py
==================

View/visualize the amount of time people spend online.

Usage
-----

Run from the top-level directory using `python -m`:

    > python -m bin.online_duration -h
    usage: online_duration.py [-h] [--grouping {user,date,weekday,hour}]
                              [--input-format {csv,log,null}]
                              [--output-format {csv,json,plot}] [--from DATE_FROM]
                              [--to DATE_TO]
                              input [output]

This script additionally requires [matplotlib] to be installed.

Analyze the database produced by [track_status.py] and calculate the total
amount of time people spent online.
For example (assuming the database in "db.csv" was generated by
[track_status.py] before):

    > python -m bin.online_duration db.csv
    89497105,John,Smith,john.smith,0:12:31
    3698577,Jane,Smith,jane.smith,1:34:46

In the example above, "John Smith" and "Jane Smith" spent approx. 13 and 95
minutes online respectively.

The output format is CSV (comma-separated values) by default.
You can also get a JSON document:

    > python -m bin.online_duration --output-format json db.csv
    [
       {
          "uid": 89497105,
          "first_name": "John",
          "last_name": "Smith",
          "screen_name": "john.smith",
          "duration": "0:12:31"
       },
       {
          "uid": 3698577,
          "first_name": "Jane",
          "last_name": "Smith",
          "screen_name": "jane.smith",
          "duration": "1:34:46"
       }
    ]

The durations are calculated on a per-user basis by default.
You can change that by supplying either `date` (to group by dates), `weekday`
(to group by weekdays) or `hour` (to group by day hours) as the `--grouping`
parameter value.
For example (assuming that both Jane and Joe spent their time online on Friday,
June 17, 2016).

```
> python -m bin.online_duration --output-format json --grouping date db.csv
[
   {
      "date": "2016-06-17",
      "duration": "1:47:17"
   }
]
```

```
> python -m bin.online_duration --output-format csv --grouping weekday db.csv
Monday,0:00:00
Tuesday,0:00:00
Wednesday,0:00:00
Thursday,0:00:00
Friday,1:47:17
Saturday,0:00:00
Sunday,0:00:00
```

```
> python -m bin.online_duration --grouping hour db.csv
0:00:00,0:00:00
1:00:00,0:00:00
2:00:00,0:00:00
3:00:00,0:00:00
4:00:00,0:03:56
5:00:00,0:14:14
6:00:00,0:29:30
7:00:00,0:31:20
8:00:00,0:12:04
9:00:00,0:00:00
10:00:00,0:00:00
11:00:00,0:23:14
12:00:00,0:06:00
13:00:00,0:46:19
14:00:00,0:00:00
15:00:00,0:00:00
16:00:00,0:00:00
17:00:00,0:00:00
18:00:00,0:00:00
19:00:00,0:00:00
20:00:00,0:00:00
21:00:00,0:00:00
22:00:00,0:00:00
23:00:00,0:00:00
```

In my opinion, the script's most useful feature is the ability to easily create
plots that represent this data (like in the examples above).
To produce a plot, pass `plot` as the `--output-format` parameter value and add
a file path to write the image to.

    > python -m bin.online_duration --output-format plot db.csv user.png

![user.png]

    > python -m bin.online_duration --output-format plot --grouping date db.csv date.png

![date.png]

    > python -m bin.online_duration --output-format plot --grouping weekday db.csv weekday.png

![weekday.png]

    > python -m bin.online_duration --output-format plot --grouping hour db.csv hour.png

![hour.png]

You can limit the scope of the database by supplying a time range.
Only online durations that are within the supplied range shall then be
processed.
Set the range by specifying both or one of the `--from` and `--to` parameters.
Values must be in the `%Y-%m-%dT%H:%M:%SZ` format (a subset of ISO 8601).

All dates and times are in UTC.

[matplotlib]: http://matplotlib.org/
[track_status.py]: track_status.md

[user.png]: images/user.png
[date.png]: images/date.png
[weekday.png]: images/weekday.png
[hour.png]: images/hour.png

Known issues
------------

* When people go online using the web version and don't visit other pages over
time (for example, just listening to music), they appear offline.
Hence the 0:00:00 durations you might sometimes encounter.
This might also happen using other clients.

See also
--------

* [License]

[License]: ../README.md#license