Server side tracking with piwik and Django

Business owners want to track usage to gain insights on how users actually use their sites and apps. However tracking can raise privacy concerns, lead to poor site performance, and raises security concerns by inviting third party javascript to run.

For Passit, an open source password manager, we wanted to track how people use our app and view our passit.io marketing site. However we serve a privacy sensitive market. Letting a company like Google snoop on your password manager feels very wrong. Our solution is to use the open source and self hosted piwik analytics application with server side tracking.

Traditional client side tracking for our marketing site

passit.io uses the piwik javascript tracker. It runs on the same domain (piwik.passit.io) and doesn’t get flagged by Privacy Badger as a tracking tool. It won’t track your entire web history like Google Analytics or Facebook like buttons do.

Nice green 0 from privacy badger!

To respect privacy we can keep on the default piwik settings to anonomize ip addresses and respect the do not track header.

Server side tracking for app.passit.io

We’d like to have some idea of how people use our app as well. Sign ups, log ins, groups usage, ect. However injecting client side code feels wrong here. It would be a waste of your computer’s resources to track your movements to our piwik server and provides an attack vector. What if someone hijacked our piwik server and tried to inject random js into the passit app?

We can track usage of the app.passit.io api on the server side instead. We can simply track how many people use different api endpoints to get a good indication of user activity.

Django and piwik

Presenting django-server-side-piwik – a drop in Django app that uses middleware and Celery to record server side analytics. Let’s talk about how it’s built.

server_side_piwik uses the python piwikapi package to track server side usage. Their quickstart section shows how. We can implement it as Django middleware. Every request will have some data serialized and sent to a celery task for further processing. This means our main request thread isn’t blocked and we don’t slow down the app just to run analytics.

class PiwikMiddleware(object):
  """ Record every request to piwik """
  def __init__(self, get_response):
  self.get_response = get_response

def __call__(self, request):
  response = self.get_response(request)

  SITE_ID = getattr(settings, 'PIWIK_SITE_ID', None)
  if SITE_ID:
    ip = get_ip(request)
    keys_to_serialize = [
      'HTTP_USER_AGENT',
      'REMOTE_ADDR',
      'HTTP_REFERER',
      'HTTP_ACCEPT_LANGUAGE',
      'SERVER_NAME',
      'PATH_INFO',
      'QUERY_STRING',
    ]
    data = {
      'HTTPS': request.is_secure() 
    }
    for key in keys_to_serialize:
      if key in request.META:
        data[key] = request.META[key]
    record_analytic.delay(data, ip)
  return response

 

Now you can track usage from the backend which better respects user privacy. No javascript and no Google Analytics involved!

Feel free to check out the project on gitlab and let me know any comments or issues. Passit’s source is also on gitlab.

Finding near locations with GeoDjango and Postgis Part I

With GeoDjango we can find places in proximity to other places – this is very useful for things like a store locator. Let’s use a store locater as an example. Our store locator needs to be able to read in messy user input (zip, address, city, some combination). Then, locate any stores we have nearby.

General concept and theory

Screenshot from 2015-10-01 17-16-31

We have two problems to solve. One is to turn messy address input into a point on the globe. Then we need a way to query this point against other known points and determine which locations are close.

Set up known locations

Before we can really begin we need to set up GeoDjango. You can read the docs or use docker-compose.

It’s still a good idea to read the tutorial even if you use docker.

Let’s add a location. Something like:

class Location(models.Model):
    name = models.CharField(max_length=70)
    point = models.PointField()

    objects = models.GeoManager()

A PointField stores a point on the map. Because Earth is not flat we can’t use simple X, Y coordinates. Luckily you can almost think of Latitude and Longitude as X, Y. GeoDjango defaults to this. It’s also easy to get Latitude and Longitude from places like Google Maps. So if we want – we can ignore the complexities of mapping coordinates on Earth. Or you can read up on SRID if you want to learn more.

At this point we can start creating locations with points – but for ease of use add GeoModelAdmin to Django Admin to use Open Street Maps to set points.

from django.contrib import admin
from django.contrib.gis.admin import GeoModelAdmin
from .models import Location

@admin.register(Location)
class LocationAdmin(GeoModelAdmin):
    pass

Screenshot from 2015-10-01 17-31-54

Wow! We’re doing GIS!

Add a few locations. If you want to get their coordinates just type location.point.x (or y).

Querying for distance.

Django has some docs for this. Basically make a new point. Then query distance. Like this:

from django.contrib.gis.geos import fromstr
from django.contrib.gis.measure import D
from .models import Location

geom = fromstr('POINT(-73 40)')
Location.objects.filter(point__distance_lte=(geom, D(m=10000)))

m is meters – you can pass all sorts of things though. The result should be a queryset of Locations that are near our “geom” location.

Already we can find locations near other locations or arbitrary points! In Part II I’ll explain how to use Open Street Maps to turn a fuzzy query like “New York” into a point. And from there we can make a store locator!

Building an api for django activity stream with Generic Foreign Keys

I wanted to build a django-rest-framework api for interacting with django-activity-stream. Activity stream uses Generic Foreign Keys heavily which aren’t naturally supported. We can however reuse existing serializers and nest the data conditionally.

Here is a ModelSerializer for activity steam’s Action model.

from rest_framework import serializers
from actstream.models import Action
from myapp.models import ThingA, ThingB
from myapp.serializers import ThingASerializer, ThingBSerializer

class GenericRelatedField(serializers.Field):
    def to_representation(self, value):
        if isinstance(value, ThingA):
            return ThingASerializer(value).data
        if isinstance(value, ThingB):
            return ThingBSerializer(value).data
        # Not found - return string.
        return str(value)

class ActionSerializer(serializers.ModelSerializer):
    actor = GenericRelatedField(read_only=True)
    target = GenericRelatedField(read_only=True)
    action_object = GenericRelatedField(read_only=True)

    class Meta:
        model = Action

GenericRelatedField will check if the value is an instance of a known Model and assign it the appropriate serializer.

Next we can use a viewset for displaying Actions. Since activity stream uses querysets it’s pretty simple to integrate with a ModelViewSet. In my case I’m checking for a get parameter to determine whether we want all actions, actions of people the logged in user follows, or actions of the user. I added some filters on action and target content type too.

from rest_framework import viewsets
from actstream.models import user_stream, Action
from .serializers import ActionSerializer


class ActivityViewSet(viewsets.ReadOnlyModelViewSet):
    serializer_class = ActionSerializer

    def get_queryset(self):
        following = self.request.GET.get('following')
        if following and following != 'false' and following != '0':
            if following == 'myself':
                qs = user_stream(self.request.user, with_user_activity=True)
                return qs.filter(actor_object_id=self.request.user.id)
            else:  # Everyone else but me
                return user_stream(self.request.user)
        return Action.objects.all()

    filter_fields = (
        'actor_content_type', 'actor_content_type__model',
        'target_content_type', 'target_content_type__model',
    )

Here’s the end result, lots of nested data.
Screenshot from 2015-07-08 17:44:59

Adding new form in a formset

Everything I read about adding a new form to a formset with javascript involves cloning an existing form. This is a terrible method, what if the initial forms are 0? What about initial data? Here’s IMO better way to do it that uses empty_form, a function Django gives you to create a form where i is __prefix__ so you can easily replace it.

Add this under you “Add new FOO” button. In my case I have a question_form with many answers (answers_formset).
[html]

var form_count_{{ question_form.prefix }} = {{ answers_formset.total_form_count }};
$(‘#add_more_{{ question_form.prefix }}’).click(function() {
var form = ‘{{answers_formset.empty_form.as_custom|escapejs}}’.replace(/__prefix__/g, form_count_{{ question_form.prefix }});
$(‘#answers_div_{{ question_form.prefix }}’).append(form);
form_count_{{ question_form.prefix }}++;
$(‘#id_{{ answers_formset.prefix }}-TOTAL_FORMS’).val(form_count_{{ question_form.prefix }});
});

[/html]

This creates you empty_form right in javascript, replaces the __prefix__ with the correct number and inserts it, in my case I made an answers_div. See empty_form.as_custom, you could just do empty_form but that would just give the you basic form html. I want custom html. Make a separate template for this. Here’s mine but this just an example.

[html]
{{ answer.non_field_errors }}
{% for hidden in answer.hidden_fields %} {{ hidden }} {% endfor %}
<table>
<tr>
<td>
<span class="answer_span">{{ answer.answer }} {{ answer.answer.errors }}</span>
</td>
……etc…….
</tr>
</table>
[/html]

In your original template you can add the forms like this {% include “omr/answer_form.html” with answer=answer %}
But for the as_custom you need to edit your form itself to add the function.

[python]
def as_custom(self):
t = template.loader.get_template(‘answer_form.html’)
return t.render(Context({‘answer’: self},))
[/python]

I find this method far more stable than trying to clone existing forms. It seems to play well with the javascript I have in some of my widgets. Clone on the other hand gave me tons of trouble and hacks needed to fix it.

Django get_or_default

Quick hack today. Often I find myself wanting to get some django object, but in the case it doesn’t exist default it to some value. Specially I keep my end user configurable settings in my database. Typically I set this up with initial data so all the settings are already there, but sometimes I’ll add a setting and forgot to add it on some site instance.

[python]class Callable:
def __init__(self, anycallable):
self.__call__ = anycallable

def get_or_default(name, default=None):
""" Get the config object or create it with a default. Always use this when gettings configs"""
object, created = Configuration.objects.get_or_create(name=name)
if created:
object.value = default
object.save()
return object
get_or_default = Callable(get_or_default)[/python]

Now I can safely call things like edit_all = Configuration.get_or_default(“Edit all fields”, “False”) which will return my configuration object with the value set as False if not specified. Much better than a 500 error. There are plenty of other uses for this type of logic. Get_or_return_none for example. The goal for me is to stop 500 errors from my own carelessness by having safe defaults.

Django admin: better export to XLS


The goal here is to make a slick gui for selecting exactly what the user wants to export from Django’s Change List view. It will be an global action, so lets start there.

[python]
def export_simple_selected_objects(modeladmin, request, queryset):
selected_int = queryset.values_list(‘id’, flat=True)
selected = []
for s in selected_int:
selected.append(str(s))
ct = ContentType.objects.get_for_model(queryset.model)
return HttpResponseRedirect("/export_to_xls/?ct=%s&ids=%s" % (ct.pk, ",".join(selected)))
export_simple_selected_objects.short_description = "Export selected items to XLS"
admin.site.add_action(export_simple_selected_objects)

[/python]

This adds a global action called Export selected items to XLS. I went with xls instead of ods because xlwt is very mature and LibreOffice can open xls just fine. It’s limited by the max length of get variables because it just lists each id. See this bug report. Next is the view.

[python]
import xlwt
def admin_export_xls(request):
model_class = ContentType.objects.get(id=request.GET[‘ct’]).model_class()
queryset = model_class.objects.filter(pk__in=request.GET[‘ids’].split(‘,’))
model_fields = model_class._meta.fields

if ‘xls’ in request.POST:
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet(unicode(model_class._meta.verbose_name_plural))
fields = []
# Get selected fields from POST data
for field in model_fields:
if ‘field__’ + field.name in request.POST:
fields.append(field)
# Title
for i, field in enumerate(fields):
worksheet.write(0,i, field.verbose_name)
for ri, row in enumerate(queryset): # For Row iterable, data row in the queryset
for ci, field in enumerate(fields): # For Cell iterable, field, fields
worksheet.write(ri+1, ci, unicode(getattr(row, field.name)))
# Boring file handeling crap
fd, fn = tempfile.mkstemp()
os.close(fd)
workbook.save(fn)
fh = open(fn, ‘rb’)
resp = fh.read()
fh.close()
response = HttpResponse(resp, mimetype=’application/ms-excel’)
response[‘Content-Disposition’] = ‘attachment; filename=%s.xls’ %
(unicode(model_class._meta.verbose_name_plural),)
return response

return render_to_response(‘export_to_xls.html’, {
‘model_name’: model_class._meta.verbose_name,
‘fields’: model_fields,
}, RequestContext(request, {}),)

[/python]

Remember to set up your URLs. Next is the HTML. Maybe something like this

[html]

$(document).ready(function()
{
$(“#check_all”).click(function()
{
var checked_status = this.checked;
$(“.check_field”).each(function()
{
this.checked = checked_status;
});
});
});

<h2> Export {{ model_name }} </h2>
<form method="post" action="">
<table>
<tr>
<th>
<input type="checkbox" id="check_all" checked="checked" />
</th>
<th>
Field
</th>
</tr>
{% for field in fields %}
<tr>
<td>
<input type="checkbox" class="check_field" checked="checked" name="field__{{ field.name }}"/>
</td>
<td>
{{ field.verbose_name }}
</td>
</tr>
{% endfor %}
</table>
<input type="submit" name="xls" value="Submit"/>
</form>
[/html]

The javascript just makes the check all box work. Note I use jquery, if you don’t you will need to rewrite it. Very simple but it works. Now users won’t have to delete unwanted columns from xls reports. Notice how the user is left on the export screen and not happily back to the edit list. Some ajax can solve this. I’m overriding the global change_list.html which actually isn’t ideal if you use any plugins that also override it. Here’s what I added.

[html]
/static/js/jquery.tools.min.js

$(document).ready(function()
{
$(“.button”).click(function()
{
if (
$(“option[value=export_simple_selected_objects]:selected”).length
&& $(“input:checked”).length
) {
$.post(
“”,
$(“#changelist-form”).serialize(),
function(data){
$(“#export_xls_form”).html(data);
}
);
$(“#export_xls_form”).overlay({
top: 60
});
$(“#export_xls_form”).overlay().load();
return false;
}
});
});

<!– Overlay, when you edit CSS, make sure this display is set to none initially –>

[/html]

I use jquery tools overlay to make a nice overlay screen while keeping the user on the change list page. Basically I want a div to appear and then load some stuff from ajax. What’s cool is that I just post the data to “” so the regular Django admin functions work without editing them for AJAX. Well I did add to the submit button onclick=’$(“#export_xls_form”).overlay().close();’ to close the window when submitting. Ok I’m a complete liar I also added get_variables = request.META[‘QUERY_STRING’] to the view as a cheap way to keep those GET variables. But hey it’s still works as a non ajax admin action and that’s cool.

In the screenshot I added a CSS3 shadow and rounded corners to make it look better.

What’s next? Well it would be nice if we could access foreign key fields. If this had some type of advanced search and saving mechanism, we’d have a full generic Django query builder. Hmm.

Spreadsheet reporting in Django

It’s often desired to quickly export data in a generic method that takes little coding. There are already some solutions for this. I didn’t like any however as they ignore  many to many fields. One could argue a more robust system might be needed when handling more complex reports with gui sql query builders and such. Screw that here’s a hack to get the job 95% done.

I want to check off fields in Django Admin’s model list display. Then make an action to make a spreadsheet report, not some crap csv.

Here is my hack function to take a model and query set and spit out an .xls file

def admin_export_xls(request, app, model, qs=None):
    """ Exports a model to xls.
    qs: optional queryset if only exporting some data"""
    mc = ContentType.objects.get(app_label=app, model=model).model_class()
    wb = xlwt.Workbook()
    ws = wb.add_sheet(unicode(mc._meta.verbose_name_plural))
    #for i, f in enumerate(mc._meta.fields):
    #    ws.write(0,i, f.name)
    # Lets get all fields names, even m2m
    for i, field in enumerate(mc._meta.get_all_field_names()):
        ws.write(0,i, field)
    if not qs:
        qs = mc.objects.all()

    for ri, row in enumerate(qs):
        for ci, f in enumerate(mc._meta.get_all_field_names()):
            try:
                # terrible way to detect m2m manager
                if unicode(getattr(row, f))[1:51] == 'django.db.models.fields.related.ManyRelatedManager':
                    # If it's a M2M relationship, serialize it and throw it in one cell.
                    value = ""
                    for item in getattr(row, f).all():
                        value += unicode(item) + ", "
                    value = value[:-2]
                    ws.write(ri+1, ci, value)
                else:
                    ws.write(ri+1, ci, unicode(getattr(row, f)))
            except:
                # happens when the m2m is has an appended _set. This is a hack that works sometimes, it sucks I know
                try:
                    f += "_set"
                    value = ""
                    for item in getattr(row, f).all():
                        value += unicode(item) + ", "
                    value = value[:-2]
                    ws.write(ri+1, ci, value)
                except:
                    ws.write(ri+1, ci, "")
    fd, fn = tempfile.mkstemp()
    os.close(fd)
    wb.save(fn)
    fh = open(fn, 'rb')
    resp = fh.read()
    fh.close()
    response = HttpResponse(resp, mimetype='application/ms-excel')
    response['Content-Disposition'] = 'attachment; filename=%s.xls' % 
          (unicode(mc._meta.verbose_name_plural),)
    return response

That will serialize manytomany fields to comma separated fields all in one cell. Next in you need an admin aciton.

def export_selected_objects(modeladmin, request, queryset):
    app = queryset[0]._meta.app_label
    model = queryset[0]._meta.module_name
    return admin_export_xls(request, app, model, queryset)
export_selected_objects.short_description = "Export selected items to XLS"

You can see I made mine global. It works on all models. For this to work well you need to make sure your unicode representations of your models are useful. The most common thing I run into are phone numbers. A person can have unlimited phone numbers. The end user will assume a report of people will include this. I make my number’s unicode something like “Cell: 555-555-5555”.

Of course this code isn’t perfect and there are many times a more robust solution will be needed. What if you want a report of companies with contacts at the company and phone numbers of each contact. At that point you need to generate a query that can get such data and that’s going to take some gross gui query builder program or custom reports by you the developer.