Self-Hosted Web Analytics with AWS and Crumby
Crumby is self-hosted open source application for tracking and reporting visitor usage of websites. Crumby is a Flask application so it works with well known tools such as Apache, MySQL, and Python. Checkout Crumby on github to see what data is tracked and how to interact with the reporting API.
In this post, I will cover how to install crumby on an Amazon EC2 instance
running httpd24 (with mod_wsgi) and mysql56. Many of the commands in this
tutorial will need to be run as root. Just add sudo
in front of the command
if you get a permission error.
Launch an Amazon EC2 Instance
Follow the instructions in the Amazon EC2 User Guide for Linux Instances to launch an Amazon Linux AMI.
Select the appropriate instance type and storage for your use case. For a small website, a t2.nano with 8 GiB EBS should work, but will probably require a swap file or some memory tuning.
Make sure the instance’s security group allows http/https access.
Connect to the Instance
- SSH to the instance using your private key and public DNS name
Bonus: Set an SSH Alias for your AWS instance in
~/.ssh/config
to simplify connecting, man ssh_config
Host aws
HostName ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com
User ec2-user
IdentityFile ~/keys/aws.pem
ssh aws
Quick Deployment
If you would rather skip the explanation and just run the commands, the following steps will setup Crumby on a t2.nano. Otherwise, move on to the next section, Install Packages to manually run each command.
-
Download scripts
wget https://raw.githubusercontent.com/bmweiner/crumby/master/deployment/linux/create_swap.sh wget https://raw.githubusercontent.com/bmweiner/crumby/master/deployment/linux/apache_mysql.sh wget https://raw.githubusercontent.com/bmweiner/crumby/master/deployment/linux/tune_memory.sh
-
Set the
domain
variable inapache_mysql.sh
to the server’s pubic DNS namedomain='ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com'
-
Run the scripts as root
sudo sh create_swap.sh sudo sh apache_mysql.sh sudo sh tune_memory.sh
-
Now you are up and running and just need to include a pointer to
cmbs.js
on the webpages you want to track. See Webpage Configuration.
Install Packages
-
Install build dependencies
yum update -y yum install -y gcc libffi-devel python-devel
-
Install Apache Web Server, MySQL, and mod_wsgi
yum install -y httpd24 mysql56-server mod24_wsgi-python27.x86_64
-
Create a directory for Crumby resources
mkdir /var/lib/crumby
-
Create and activate a virtual environment
virtualenv /var/lib/crumby/virtenv source /var/lib/crumby/virtenv/bin/activate
-
Install python packages to the virtual environment
pip install crumby pymysql
-
Install GeoLite2-City database and create example crumby config file
cd /var/lib/crumby crumby geoip crumby init
-
Create a password for MySQL and a secret key
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 24 > /var/lib/crumby/MYSQL_PASS cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 24 > /var/lib/crumby/SECRET_KEY
-
(optional) Remove build dependencies
Note: This is just to save space. If you plan to use these tools in the future, just keep them.
yum erase -y gcc libffi-devel python-devel
yum autoremove -y
Configure MySQL
-
Start MySQL and secure the installation - set the new password to the random value previously generated in
/var/lib/crumby/MYSQL_PASS
service mysqld start mysql_secure_installation
-
Create the crumby database
mysql -uroot -p$(cat /var/lib/crumby/MYSQL_PASS) --execute="CREATE DATABASE crumby;"
-
Set MySQL to start on reboot (modify run level)
chkconfig mysqld on
Configure Apache
-
Open the Apache config file
vi /etc/httpd/conf/httpd.conf
-
Add a virtual host for the Crumby application - replace
site.com
with the server’s public DNS name, and uncomment those lines<VirtualHost \*:80> #ServerName site.com #ServerAdmin admin@site.com WSGIProcessGroup crumby WSGIDaemonProcess crumby python-home=/var/lib/crumby/virtenv WSGIScriptAlias / /var/www/wsgi-scripts/crumby.wsgi <Directory /var/www/wsgi-scripts> <Files crumby.wsgi> Require all granted </Files> </Directory> </VirtualHost>
-
Set Apache Web Server to start on reboot (modify run level)
chkconfig httpd on
Configure SSL/TLS (Recommended)
Users that access private queries through the web UI will need to authenticate. To protect credentials Configure Apache Web Server on Amazon Linux to Use SSL/TLS
-
Install Apache module
yum install -y mod24_ssl
-
Open the Apache config file
vi /etc/httpd/conf/httpd.conf
-
Add a virtual host to handle requests on 443 - replace
site.com
with the server’s public DNS name, and uncomment those lines<VirtualHost \*:443> #ServerName site.com #ServerAdmin admin@site.com WSGIProcessGroup crumby WSGIScriptAlias / /var/www/wsgi-scripts/crumby.wsgi <Directory /var/www/wsgi-scripts> <Files crumby.wsgi> Require all granted </Files> </Directory> </VirtualHost>
Configure Crumby
-
Open crumby config for editing
vi /var/lib/crumby/crumby.cfg
-
Modify parameters for production, see Crumby Configuration for a description of the configuration parameters.
import os base_path = '/var/lib/crumby' with open(os.path.join(base_path, 'MYSQL_PASS')) as f: pw = f.read().strip('\n') db = 'mysql+pymysql://root:{}@localhost/crumby'.format(pw) DOMAIN = '' # server's pubic DNS name SQLALCHEMY_DATABASE_URI = db GEOIP2_DATABASE_NAME = '/var/lib/crumby/GeoLite2-City.mmdb' with open(os.path.join(base_path, 'SECRET_KEY')) as f: SECRET_KEY = f.read().strip('\n') SESSION_COOKIE_SECURE = True # Include if you enabled SSL/TLS CROSSDOMAIN_ORIGIN = '' # URL(s) permitted to access the crumby API
-
Create the WSGI application script file
mkdir /var/www/wsgi-scripts vi /var/www/wsgi-scripts/crumby.wsgi
-
Add the following contents to the script file
#!/usr/bin/env python """WSGI entry point.""" import os os.environ['CRUMBY_SETTINGS'] = '/var/lib/crumby/crumby.cfg' from crumby import app as application
Startup and Testing
-
Start Apache and MySQL
service mysqld restart service httpd restart
-
View error logs to troubleshoot any errors
tail -f /etc/httpd/logs/error_log
Webpage Configuration
To begin sending tracking data to your analytics server, just include a pointer
to the cmb.js script on the webpages you want to track. Replace site.com
with
the server’s pubic DNS name (e.g. ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com).
<script src="https://site.com/cmb.js"></script>
(optional) Tune Settings for a Low Memory VPS
If you are running an instance with low memory (e.g. t2.nano), you will probably have trouble starting mysql. You can try creating a swap file and/or adjusting memory allocation settings for MySQL and/or Apache. Alternatively, you can provision an EC2 instance with higher dedicated memory.
Create a Swap File
Refer to the DigitalOcean guide, How To Add Swap on CentOS7, here is the general gist:
-
Setup a swap file
fallocate -l 1G /swapfile chmod 600 /swapfile mkswap /swapfile swapon /swapfile
-
Mount the swap file on boot by adding this line to
/etc/fstab
/swapfile swap swap sw 0 0
-
Set swappiness and cache pressure parameters in
/etc/sysctl.conf
# Lower swappiness and cache pressure for low-memory vps vm.swappiness = 10 vm.vfs_cache_pressure=50
-
Apply the parameter settings
sysctl -p
Adjust Memory Allocation
Refer to InnoDB Startup Options and System Variables for parameter specifics.
-
Configure MySQL server system variables in
/etc/my.cnf
[mysqld] innodb_buffer_pool_size=32M innodb_log_buffer_size=256K key_buffer_size=8 max_connections=10
-
Configure Apache configuration directives in
/etc/httpd/conf/httpd.conf
# configure directives for low memory <IfModule prefork.c> StartServers 3 MinSpareServers 2 MaxSpareServers 5 MaxClients 10 MaxRequestsPerChild 1000 </IfModule>
python, web, open-source