online_duration.py
==================
View/visualize the amount of time people spend online.
Usage
-----
Run from the top-level directory using `python -m`:
> python -m bin.online_duration -h
usage: online_duration.py [-h] [--grouping {user,date,weekday,hour}]
[--input-format {csv,log,null}]
[--output-format {csv,json,plot}] [--from DATE_FROM]
[--to DATE_TO]
input [output]
This script additionally requires [matplotlib] to be installed.
Analyze the database produced by [track_status.py] and calculate the total
amount of time people spent online.
For example (assuming the database in "db.csv" was generated by
[track_status.py] before):
> python -m bin.online_duration db.csv
89497105,John,Smith,john.smith,0:12:31
3698577,Jane,Smith,jane.smith,1:34:46
In the example above, "John Smith" and "Jane Smith" spent approx. 13 and 95
minutes online respectively.
The output format is CSV (comma-separated values) by default.
You can also get a JSON document:
> python -m bin.online_duration --output-format json db.csv
[
{
"uid": 89497105,
"first_name": "John",
"last_name": "Smith",
"screen_name": "john.smith",
"duration": "0:12:31"
},
{
"uid": 3698577,
"first_name": "Jane",
"last_name": "Smith",
"screen_name": "jane.smith",
"duration": "1:34:46"
}
]
The durations are calculated on a per-user basis by default.
You can change that by supplying either `date` (to group by dates), `weekday`
(to group by weekdays) or `hour` (to group by day hours) as the `--grouping`
parameter value.
For example (assuming that both Jane and Joe spent their time online on Friday,
June 17, 2016).
```
> python -m bin.online_duration --output-format json --grouping date db.csv
[
{
"date": "2016-06-17",
"duration": "1:47:17"
}
]
```
```
> python -m bin.online_duration --output-format csv --grouping weekday db.csv
Monday,0:00:00
Tuesday,0:00:00
Wednesday,0:00:00
Thursday,0:00:00
Friday,1:47:17
Saturday,0:00:00
Sunday,0:00:00
```
```
> python -m bin.online_duration --grouping hour db.csv
0:00:00,0:00:00
1:00:00,0:00:00
2:00:00,0:00:00
3:00:00,0:00:00
4:00:00,0:03:56
5:00:00,0:14:14
6:00:00,0:29:30
7:00:00,0:31:20
8:00:00,0:12:04
9:00:00,0:00:00
10:00:00,0:00:00
11:00:00,0:23:14
12:00:00,0:06:00
13:00:00,0:46:19
14:00:00,0:00:00
15:00:00,0:00:00
16:00:00,0:00:00
17:00:00,0:00:00
18:00:00,0:00:00
19:00:00,0:00:00
20:00:00,0:00:00
21:00:00,0:00:00
22:00:00,0:00:00
23:00:00,0:00:00
```
In my opinion, the script's most useful feature is the ability to easily create
plots that represent this data (like in the examples above).
To produce a plot, pass `plot` as the `--output-format` parameter value and add
a file path to write the image to.
> python -m bin.online_duration --output-format plot db.csv user.png
![user.png]
> python -m bin.online_duration --output-format plot --grouping date db.csv date.png
![date.png]
> python -m bin.online_duration --output-format plot --grouping weekday db.csv weekday.png
![weekday.png]
> python -m bin.online_duration --output-format plot --grouping hour db.csv hour.png
![hour.png]
You can limit the scope of the database by supplying a time range.
Only online durations that are within the supplied range shall then be
processed.
Set the range by specifying both or one of the `--from` and `--to` parameters.
Values must be in the `%Y-%m-%dT%H:%M:%SZ` format (a subset of ISO 8601).
All dates and times are in UTC.
[matplotlib]: http://matplotlib.org/
[track_status.py]: track_status.md
[user.png]: images/user.png
[date.png]: images/date.png
[weekday.png]: images/weekday.png
[hour.png]: images/hour.png
Known issues
------------
* When people go online using the web version and don't visit other pages over
time (for example, just listening to music), they appear offline.
Hence the 0:00:00 durations you might sometimes encounter.
This might also happen using other clients.
See also
--------
* [License]
[License]: ../README.md#license