Building a continuous integration workflow with gitlab and openshift 3

In this post I’ll go over building and testing a Docker image with gitlab CI and then pushing that image to Openshift 3. It should be somewhat helpful for people using other Docker solutions like Kubernetes too or CI solutions like Jenkins. I’m using Django for the project with some front end assets built in node.

Our goal is to have one docker environment used in development, CI, staging, and production. We’ll avoid repeating ourselves with image building.

At a high level my workflow looks like

Screenshot from 2016-04-01 15-12-12

Local development

All local development happens with docker compose. There is plenty of info on the matter so I’ll skip most of this. I will point out that I want to use the same python based docker image for development and later in production.

Continuous Integration

I’m using gitlab and gitlab CI runner to do testing and build a docker image. Gitlab has some docs on how to build a docker image. The choices are shell and docker-in-docker. I found docker-in-docker to be slow, complex, and error prone. In theory it would be better since the environments are more isolated.

Gitlab CI building images with shell executor

Here is my full .gitlab-ci.yml file for reference.

The goal here is to build the image, run unit tests, deploy on success, and always clean up. I had to run gitlab ci runner as root otherwise I would get permission errors. 😦

In a non trivial CI system, shell can get messy too. We need to be concerned about building too many images and filling all disk space, exhausting the number of docker network subnet pools, and ensuring concurrency works (if you need that).

Disk space – I suggest using a service that lets you attach a large volume that is formatted with an lot of inodes and using Docker’s overlayfs storage engine. I used AWS’s EC2 with a 120gb mount for /var/docker. See this blog post for details. Pay attention to the part where you define inodes. I went with 16568256.

Docker clean up – Gitlab has a docker image that can help clean up docker images and containers for you here. I’d also consider restarting the server at night and running your own clean up scripts too. I also place a CI cleanup stage like

stages:
  - build
  - test
  - deploy
  - clean

...
variables:
  PROJECT_NAME: myproject$CI_BUILD_REF
  COMPOSE: docker-compose -p myproject$CI_BUILD_REF

...
clean_docker:
  stage: clean
  when: always
  script:
    - $COMPOSE stop
    - $COMPOSE down
    - $COMPOSE rm -f

I’ve been using docker since 1.0 and I’m still always amazed by how it finds new ways of breaking itself. You may need to add your own hacks to seek and destroy docker images and containers that will want to build up forever.

The $CI_BUILD_REF is to ensure each docker image is unique – this allows us to run multiple builds and have some certainty the image being tested is the one being pushed to docker hub.

The test stages are rather django/node specific. Just place whatever code needs to execute to run tests here. If it gets a success exit code gitlab CI will know it passed.

Pushing to docker hub – I’m tagging my tested image, pushing it to docker hub, and running a webhook to notify openshift to automatically pull the image and deploy it to staging.

deploy_staging:
  stage: deploy
  only:
   - qa
  script:
   - echo "Tag and push ${PROJECT_NAME}_web"
    - docker tag ${PROJECT_NAME}_web ${IMAGE_NAME}qa
    - docker push ${IMAGE_NAME}:qa
    - "./bin/send-deploy-webhook.sh qa $DEPLOY_WEBHOOK_STAGING"

Notice how I’m only running this on the qa branch and that I’m tagging the image as “qa”. I’m using docker tags so that I can have one image that has different development stages – dev, staging, and production.

Openshift with docker build strategy

Openshift lets you build using a source to image strategy or docker. Source to image would mean rebuilding a docker image – which we already did in CI. So let’s not use that. The docker strategy was a bit confusing to me however. I ended up having a very minimal build stage using Openshift’s docker build strategy. Here is a snippet from my build yaml.

  source:
    type: Dockerfile
    dockerfile: "FROM thelab/tsi-cocoon:devnRUN ./manage.py collectstatic --noinput"
  strategy:
    type: Docker
    dockerStrategy:
      from:
        kind: DockerImage
        name: 'docker.io/user/image:dev'
      pullSecret:
        name: dockerhub
      forcePull: true

Notice the source type Dockerfile with a VERY minimal inline dockerfile that just gets the right image and collects static (Django specific – this could really just be FROM image-name)

The strategy is set to type: Docker and includes my docker image and the “secret” needed to pull the image from my private repo. Note that if you must specify the full docker registry (docker.io/ect) or else it will not work with a private registry. You need to add the secret using oc secrets new dockerhub .dockercfg=dockercfg where dockercfg is the file that might be under ~/.dockercfg.

forcePull is set to true so that openshift does a docker pull each time.

You’ll need to define deployment, services, ect in openshift – but I include that in the scope of this post. I switched a source to image build to docker based without having to touch anything else.

That’s it – the same docker image you used with compose locally should be on openshift. I set up a workflow where git commits on specific branches automatically deploy on openshift staging environments. Then I manually trigger the production deploy using the same image as staging.

Extending Openshift’s source to image

Openshift 3 introduces a new concept “source to image” or s2i. It’s a way to create a Docker image out of some source code and a Docker image – for example a python s2i image. It makes Openshift 3’s docker based workflow feel more like Openshift 2 or Heroku.

Image from http://blog.octo.com/en/openshift-3-private-paas-with-docker/

One problem I ran into using s2i-python was a lack of binary packages installed that are needed to build things like Pillow. For this we need to extend the base image to add what we want. You can see the end result on github.

Let’s review the Dockerfile I build. We start by extending centos’s s2i image. Note I’m using Python 3.4. You can review the upstream project here to see what I’m extending.

FROM centos/python-34-centos7

Next I’m mimicking how centos installs packages from the upstream file. Upstream and my version

You can see I installed postgresql client tools and node too which I need. I also install a couple pip packages I know I’ll need on all my python projects – this is done just to speed up the build time. Do as much or as little as you want.

Next I’ll post it on Docker hub with automated builds. See here. I added centos/python-34-centos7 as a linked repository. This way my docker image builds any time the upstream image builds too – ensuring I get security updates.

Screenshot from 2016-03-02 15-40-55

Finally I just use the image as I would the original s2i python in openshift. I can add it as I would any image with oc import-image s2i-python --from="thelab/s2i-python:latest" --confirm

Next post I’ll describe how to reuse a customized s2i python image on local development with Docker compose.

Finding near locations with GeoDjango and Postgis Part II

This is a continuation of part I.

Converting phrases into coordinates

Let’s test out Nominatim – the Open Street Maps tool to search their map data. We basically want whatever the user types in to turn into map coordinates.

from geopy.geocoders import Nominatim
from django.contrib.gis.geos import Point

def words_to_point(q):
    geolocator = Nominatim()
    location = geolocator.geocode(q)
    point = Point(location.longitude, location.latitude)
    return point

We can take vague statements like New York and get the coordinates of New York City. So easy!

Putting things together

I’ll use Django Rest Framework to make an api where we can query this data and return json. I’m leaving out some scaffolding code – so make sure you are familiar with Django Rest Framework first – or just make your own view without it.

class LocationViewSet(mixins.ListModelMixin, viewsets.GenericViewSet):
    """ Searchable list of locations

    GET params:

    - q - Location search paramater (New York or 10001)
    - distance - Distance away. Defaults to 50
    """
    serializer_class = LocationSerializer
    queryset = Location.objects.all()

    def get_queryset(self, *args, **kwargs):
        queryset = self.queryset
        q = self.request.GET.get('q')
        distance = self.request.GET.get('distance', 50)
        if q:
            point = words_to_point(q)
            distance = D(mi=distance)
            queryset = queryset.filter(point__distance_lte=(point, distance))
        return queryset

My central park location shows when I search for New York, but not London. Cool.
Screenshot from 2015-10-02 14-55-45

At this point we have a usable backend. Next steps would be to pick a client side solution to present this data. I might post more on that later.

Finding near locations with GeoDjango and Postgis Part I

With GeoDjango we can find places in proximity to other places – this is very useful for things like a store locator. Let’s use a store locater as an example. Our store locator needs to be able to read in messy user input (zip, address, city, some combination). Then, locate any stores we have nearby.

General concept and theory

Screenshot from 2015-10-01 17-16-31

We have two problems to solve. One is to turn messy address input into a point on the globe. Then we need a way to query this point against other known points and determine which locations are close.

Set up known locations

Before we can really begin we need to set up GeoDjango. You can read the docs or use docker-compose.

It’s still a good idea to read the tutorial even if you use docker.

Let’s add a location. Something like:

class Location(models.Model):
    name = models.CharField(max_length=70)
    point = models.PointField()

    objects = models.GeoManager()

A PointField stores a point on the map. Because Earth is not flat we can’t use simple X, Y coordinates. Luckily you can almost think of Latitude and Longitude as X, Y. GeoDjango defaults to this. It’s also easy to get Latitude and Longitude from places like Google Maps. So if we want – we can ignore the complexities of mapping coordinates on Earth. Or you can read up on SRID if you want to learn more.

At this point we can start creating locations with points – but for ease of use add GeoModelAdmin to Django Admin to use Open Street Maps to set points.

from django.contrib import admin
from django.contrib.gis.admin import GeoModelAdmin
from .models import Location

@admin.register(Location)
class LocationAdmin(GeoModelAdmin):
    pass

Screenshot from 2015-10-01 17-31-54

Wow! We’re doing GIS!

Add a few locations. If you want to get their coordinates just type location.point.x (or y).

Querying for distance.

Django has some docs for this. Basically make a new point. Then query distance. Like this:

from django.contrib.gis.geos import fromstr
from django.contrib.gis.measure import D
from .models import Location

geom = fromstr('POINT(-73 40)')
Location.objects.filter(point__distance_lte=(geom, D(m=10000)))

m is meters – you can pass all sorts of things though. The result should be a queryset of Locations that are near our “geom” location.

Already we can find locations near other locations or arbitrary points! In Part II I’ll explain how to use Open Street Maps to turn a fuzzy query like “New York” into a point. And from there we can make a store locator!

Review of Tutanota

Tutanota is an open source email provider. It features easy to use end to end encryption. It’s notable as a modern, libre, and cheap hosted email provider.

Why not gmail

Gmail is easily the best email provider. It’s light years ahead of any open source system. It’s gratis for individual, education, nonprofit, and small business use. That makes it hard to compete with. However it’s also a giant proprietary service directing a huge portion of your communication to the world. As a free software advocate it’s hard to feel good about using Google services – especially ones as critical as communication. Google also took concerning steps to remove interoperability in it’s chat service by removing XMPP in the Google Hangouts.

In looking at alternatives I want something hosted (I do not want to deal with tech problems at home for something as important as email). I’d like something cheap – because even cheap is more expensive that Gmail. I also want something modern looking and easy to use.

The Good

Tutanota’s big feature is security. It’s not hosted in the United States. You can encrypt emails by clicking a button. It doesn’t sell your information to ad networks. It doesn’t leak your personal information when such ad networks or government agencies get hacked.

Tutanota has a modern web interface with native mobile apps (web wrapper, but acceptable). It’s minimalist but that isn’t necessarily bad.
Screenshot from 2015-09-27 15-29-41

Tutanota is easy to use. You can use their hosted version if you trust them and don’t want to host yourself. Setting up a custom domain was pretty easy. Aliases are no problem. It supports multiple users too (but I haven’t tested this out yet).

It’s cheap at 1€ per month per user for the premium account. It also has a free tier if you can go without a custom domain.

The Bad

Tutanota doesn’t support IMAP. That’s pretty annoying. I’d be highly annoyed if there was any decent open source email client – but as there is not I can forgive it. IMAP wouldn’t work with their security model. I wish they had it as an option with a big warning about security.

It lacks a lot of features you expect in email. As I said Gmail is way ahead here. No magic category filtering. No filtering rules. No contact book integration. Not even keyboard shortcuts.

Tutanota only recently open sourced it’s code. It needs to do more to promote a developer community. A public issue tracker would be nice. So would a contribution guide. Right now they only offer a user voice page that is more consumer focused. I left my feedback.

Conclusion

Tutanota is a great option if you want an open source email provider – but only if your feature requirements are minimal. It has great potential as they add more features. The world needs an easy way to send private messages and right now such privacy is a luxury for those very few who understand PGP encryption and can set up services themselves. I applaud anyone trying to make this easier.

Building an api for django activity stream with Generic Foreign Keys

I wanted to build a django-rest-framework api for interacting with django-activity-stream. Activity stream uses Generic Foreign Keys heavily which aren’t naturally supported. We can however reuse existing serializers and nest the data conditionally.

Here is a ModelSerializer for activity steam’s Action model.

from rest_framework import serializers
from actstream.models import Action
from myapp.models import ThingA, ThingB
from myapp.serializers import ThingASerializer, ThingBSerializer

class GenericRelatedField(serializers.Field):
    def to_representation(self, value):
        if isinstance(value, ThingA):
            return ThingASerializer(value).data
        if isinstance(value, ThingB):
            return ThingBSerializer(value).data
        # Not found - return string.
        return str(value)

class ActionSerializer(serializers.ModelSerializer):
    actor = GenericRelatedField(read_only=True)
    target = GenericRelatedField(read_only=True)
    action_object = GenericRelatedField(read_only=True)

    class Meta:
        model = Action

GenericRelatedField will check if the value is an instance of a known Model and assign it the appropriate serializer.

Next we can use a viewset for displaying Actions. Since activity stream uses querysets it’s pretty simple to integrate with a ModelViewSet. In my case I’m checking for a get parameter to determine whether we want all actions, actions of people the logged in user follows, or actions of the user. I added some filters on action and target content type too.

from rest_framework import viewsets
from actstream.models import user_stream, Action
from .serializers import ActionSerializer


class ActivityViewSet(viewsets.ReadOnlyModelViewSet):
    serializer_class = ActionSerializer

    def get_queryset(self):
        following = self.request.GET.get('following')
        if following and following != 'false' and following != '0':
            if following == 'myself':
                qs = user_stream(self.request.user, with_user_activity=True)
                return qs.filter(actor_object_id=self.request.user.id)
            else:  # Everyone else but me
                return user_stream(self.request.user)
        return Action.objects.all()

    filter_fields = (
        'actor_content_type', 'actor_content_type__model',
        'target_content_type', 'target_content_type__model',
    )

Here’s the end result, lots of nested data.
Screenshot from 2015-07-08 17:44:59