How to speed up local Kubernetes development by proxying Helm charts

At Paxos, we’ve been running our services on Kubernetes for a while, but had trouble managing multiple environments until we adopted Helm and ChartMuseum. While these tools introduced a few troubles with local development at first, we made ChartProxy to address these woes, and the deployment process has never been smoother.

A Brief Primer on Helm and ChartMuseum

(Feel free to skip to the next section if you’re already familiar with these technologies.)

To write Kubernetes resource files for multiple environments, ConfigMaps will be your primary option to assign environment variables, set properties, and generally configure your resources. However, this is a fairly limiting approach, and you may find yourself copying entire files with minor changes. Helm is a package manager for Kubernetes, and it’s designed to alleviate these problems. It uses Golang’s template package with sprig functions to fill in variables, conditionally include code and even fill in entire resources (or portions thereof). Once you’ve written your templates, you can use Helm to package your resources into a single deployable chart, which can depend on other charts. You can then deploy this chart to your Kubernetes cluster and easily manage upgrades and rollbacks.

ChartMuseum is a chart repository that easily integrates with Helm. It manages your chart versions and allows your team to share charts with each other and with your production environment. Crucially, for our purposes, ChartMuseum exposes a simple REST API that’s easy to work with.

This barely scratches the surface of the use cases of Helm; if you want to learn more, check out the official documentation. Just be sure to come back here to increase your team’s local development productivity!

Problems with Local Development

We ran into problems with chart development nearly as soon as we began to convert our deploys to Helm. Because we were using ChartMuseum, we noticed that our deploy scripts contained many references to our chart repository (e.g. `paxos-charts/my-chart`). This meant that any time we ran our deployment, Helm would only use the chart in our shared repository. While this is the desired behavior when deploying to production, it’s the exact opposite of what we wanted to develop a chart and deploy it locally.

Initially, we attempted two strategies to get around this, but neither worked very well. In our first attempt, we simply pushed our in-progress charts to our shared repo. This approach worked perfectly when one engineer was working on the conversion, but as you can guess, it quickly fell apart when we started adding more engineers to the project. We found that an individual’s deploy could be easily broken by development changes made by another engineer to a different chart.

Afterwards, we made local changes to our deployment scripts to reference our local charts instead of the ones stored in ChartMuseum. This involved changing `paxos-charts/my-chart` to `/path/to/my/local/chart`, and then changing it back before committing. This process addressed the chaos of free-for all chart publishing of our first attempt, but was annoying, slow, and prone to error.

Enter ChartProxy

What we really wanted was a solution that allowed us to freely publish charts without interfering with each other. Fortunately, this isn’t a new problem, and the world of package management offers plenty of analogous solutions, such as `mvn install` and `npm link`. With this in mind, we made ChartProxy, a simple Bash script that manages a local ChartMuseum and provides access to our shared charts.

Working with ChartProxy is super simple and only requires adding a few commands to our process. First, `chartproxy start` starts the local ChartMuseum and re-directs all calls bound for our remote repo to our local one instead. This works seamlessly for both publishing and retrieving charts, so none of our deployment scripts need to be changed. Next, `chartproxy update` grabs a chart that it doesn’t already have from our shared repo and adds them locally. This way, we’re able to fully use our latest production charts in conjunction with whatever chart we happen to be working on in development. Finally, `chartproxy stop` stops the local ChartMuseum and points helm back to our shared repo. With this in place, we only need to update the Helm `requirements.lock` file before committing.

After rolling out ChartProxy, we realized that we had the potential to solve another problem. While switching to Helm, we were simultaneously modifying our AWS VPCs. This occasionally resulted in engineers losing access to our remote ChartMuseum. Fortunately, ChartProxy pulls down a full copy of the remote repo, allowing an engineer with the proper files to tar them up and send them to whomever needed them. We ended up doing this enough times that we added `pack` and `unpack` commands to simplify the process and prevent any mistakes.

One of the best parts of being a software engineer is the ability to create your own tools. Now, with ChartProxy, we can use the full powers of Helm and ChartMuseum for local and remote deploys without fiddling with temporary code changes or interfering with each other’s development. Give ChartProxy a try if you’re struggling to unify your chart deployments across development and production.

The Paxos Engineering team is always creating clever solutions to solve problems big and small. Interested in joining a team that values ingenuity, impact and cold brew coffee? Join us!

Shrink your ops team by putting the Dev in DevOps

Every software system requires an exhaustive amount of configuration and maintenance: from provisioning servers, deploying code, installing and upgrading services, to storing data, managing DNS records and applying security protocols, there is a tremendous amount to consider. At Paxos, we run extensive microservice deployments in various environments on AWS - maintaining them at a high level while keeping overhead minimal and visibility high is critical, yet challenging. Without an easy way to set up and orchestrate a system, the system can quickly become difficult to understand and maintain. A series of configuration decisions and manual configurations can lead to a system that is inefficient, hard to support and highly unpredictable. Some of the challenges engineers face today around systems are:

  • Production systems can be hard to maintain if they are set up without documentation
  • Time to set up can be very long, especially if setup has errors
  • Money could be spent inefficiently on servers that no one knows had been setup, or their specs are not tracked properly  
  • Even more money could be wasted on hiring and training dedicated personnel to set up and troubleshoot the system
  • Ramping new employees up to maintain a system is a lengthy process, and codifying the architecture makes it significantly easier for a new engineer to understand all components of the system
  • It is hard to precisely replicate environments (Dev, QA and Prod)
  • It is hard to restore a crashed / destroyed system

Enter Terraform

Terraform allows every engineer to simply “Write, Plan, and Create Infrastructure as Code”. This framework’s code can be applied to most cloud providers and supports several on-prem capable systems. Terraform is fully extensible, and anyone can write a provider that supports  a new type of infrastructure. Almost any stateful system with an API can have a usable Terraform provider. A list of all official providers can be found here. There is also a community of unofficial providers  that are written by the open source community. Once written, Terraform modules can be invoked to provision servers (such as EC2 instances or RDS instances on AWS) and set up routing, permissions, access controls and DNS records, which requires little to no manual work. Such system setup is easily repeatable, allowing for the same setup for different environments, while simultaneously creating an accurate recovery story. Terraform is modular and allows every service to be set up and torn down separately, which simplifies troubleshooting a service’s setup and configuration. Terraform supports modules, which are functions with side-effects (variables are parameters and outputs are return values), and can be called from other modules via a flexible source-reference mechanism.

Using Terraform as code that is checked into source control allows the system to be reviewed, documented and tracked. This proves to be highly valuable to make and present architectural decisions, which makes it easy to review the system’s architecture.

Beyond that, using terraform forces a unified process of updating systems, thereby preventing a single person from going rogue and making changes without the rest of the team knowing.

Along with Terraform, Terragrunt allows sending parameters to Terraform. While a terraform code can have variables (such as environments, subnets, domain names, etc), the same code can be invoked multiple times by Terragrunt by providing environment specific values to terraform - for QA, Dev, Staging or production.

Terraform plays nicely with a variety of tools such as Packer, which is used to create cloud images (ex. AWS AMIs), and Chef (Deployment and Configuration server based software deployment). In this post, I’m adding an example that shows how to use packer to create a Cassandra image. I have found Packer to be very helpful in automating the creation of images for our deployments.

Terraform can save its setup state in JSON format, which allows for incremental (or staggered) deployment. Once a state has been saved, it can be read and updated incrementally or deleted.  

Here’s an example of how one can setup a typical microservices deployment with Terraform, and then replicate it across environments in AWS:  

  1. Set up the management VPC, security groups and IAM roles, which automatically get written to S3.
  2. Set up management tools (such as Jenkins, Prometheus and Grafana and Security modules). These new objects get written to S3.
  3. Set up the environment VPC, security groups and subnets (with peering to the management VPC) - this environment will depend on the Terragrunt parameters, such as environment names and IP ranges.
  4. Using the “terraform_remote_state” rule, we can read the remote state and set up instances of EC2, RDS, Cassandra, Kubernetes or any other object that are based on the stored resources (VPCs, subnets, iam roles to name a few) that were written to S3.
  5. Set up a kubernetes cluster for the applications based on the database IPs that were written to S3.

Use case - Cassandra and Packer Terraform

To demonstrate our use of Terraform, I have published an example of Cassandra terraform usage.

A repository with Terragrunt and Terraform code is live here: https://github.com/paxos-bankchain/terraform-examples

In this example, we set up three Cassandra nodes in a private VPC and subnet, as well as a bastion host to allow SSHing into them. Our Terraform code builds an AMI using packer, creates the cassandra using the AMI it created, and then sets the Cassandra boxe’s IPs and other settings in the cassandra nodes for the cluster to come up.

At a high level, by going to the terraform-examples/terragrunt/test directory and running Terragrunt apply-all, a cassandra cluster will be set up in a private VPC along with a bastion host.

This code demonstrates the following key capabilities Terragrunt has:

  • The backend is set to S3 with connection details specified in Terragrunt.
  • Variables are passed in from Terragrunt.
  • Packer build - this section creates an AMI image based on a hash of changes (if nothing has changed, don’t create a new AMI). Packer builds are defined in image.json
  • ENI - network interface object is created to allow IPs to be consistent and known during and across instance creations.
  • Creating EC2 instances as Cassandra nodes
  • Running custom config (IPs, etc.) with user_data once an instance is created.

We have found that this combination of steps works very well and allows us to replicate the same logic to most other types of applications (Kafka, Zookeeper and Nginx to name a few).

Summary

Terraform has many provisioners that allow you to address all aspects of your environment setup and customization.

We set up various objects in a staggered manner and run Terraform code via Terragrunt to set up management, VPCs, IAM roles, Security Groups and application hosts. Terraform state is backed by S3.

Terraform comes with its own set of challenges: often, one has to think creatively about how to create a form that sets up a specific system. Using the “null_resource” wildcard allows you to define any custom action that is not officially supported, which we had to rely on significantly.

When developing in Terraform, it is recommended to first perform the actual installation on a bare EC2 instance, and then re-create the same steps in Packer for image, Chef for certain applications and recipes, bash for configuration (“user_data”), and Terraform to setup the IAM, VPC and Hosts.

Running Terraform allows for absolute visibility into our systems and for the ability to replicate and reproduce environments; due to these important benefits, we choose to use Terraform to set up our environments at Paxos.

 


Interested in working with these technologies (and more)? Join us!

Should I Migrate to an Async Framework?

The main problem with (traditional) synchronous code, is that processes hang while waiting for external data. Have you ever wondered “why do we need to provision yet another beefy AWS machine?” It may be because you’re not using your existing ones efficiently.


At Paxos, we think asyncio is python3’s killer feature. When we talk about python3, the first thing people usually ask is if they should switch to using it in their existing codebase. Here’s what we wrote in the past:

“With more python3 library support than python2, there’s no good reason to build a new app using python2. It makes sense to use asyncio if your app wastes a lot of cycles waiting on IO, is a good fit for an asynchronous framework (especially websockets), and resource intensive (reduce your server bill).”


Performance

When properly implemented, asyncio (with uvloop) can be very fast (link):

“uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.”


For example, in this cherry-picked benchmark, aiohttp had ~26x higher throughput (vs Flask): Screen Shot 2017-10-11 at 2.05.09 PM.png


An Example

Suppose you have a simple website that is really just a visual wrapper for an API call. When a visitor loads your site, your server makes an external API call (say via HTTP), processes the results and serves them in pretty HTML. This API normally returns responses in < 50ms, and the server takes another 10ms to perform any business logic and render the HTML. So pages load in < 60 ms, and your end users are happy viewing your content.

Under normal circumstances, this will work sufficiently well for users. The only issue is that during those 50ms, the process remains blocked waiting on a response. No other code can be run by that process. So what happens if you get two requests at the same time? Any production server will route traffic to another worker (exactly how to do that routing is not so simple). As long as you have more workers available than requests, all your requests can be routed immediately and your users will remain happy. Pages will continue to load in < 60 ms.

Let’s say your site is running well with no changes, and you get on the front page of Hacker News. Suddenly, your pages are taking two seconds to load and you are trying to figure out what's wrong. What happened? If you have more requests than workers available, requests will start to queue up waiting for available workers. You can find yourself in a situation where your processes are all individually sitting idle waiting for 50 ms API responses, and you have a queue backing up of new requests that are waiting for an available worker.

Alternatively, imagine that your site is running well with no changes and you don’t even have a spike in traffic, but pages start taking two seconds to load. You find that the API (that normally returns in ~50 ms) is having issues and starts returning in 500 ms. This is obviously bad for end users, as loading your webpage now takes a minimum of 510 ms to return. However, since threads will be blocked while waiting, your response times could be much slower than that. Your site will queue up lots of requests without any increase in load!

As you think through this example, it becomes obvious that it’s true for any heavy IO app. A slow external API call is no different from a slow database query.


Existing Solutions

This sounds like a big problem in the synchronous paradigm, as all of these blocked workers affect your bottom line. Companies pay money to have enough servers to serve requests quickly. Since blocked workers are unavailable to do other work, companies must increase their server budget to increase the number of workers available to pick up the slack.

In many cases, this solution works well enough. After all, most languages/frameworks are much easier to code in synchronously, and skilled developers are often much more expensive than hardware. In the words of Stack Overflow co-founder Jeff Atwood:

“Hardware is cheap, and programmers are expensive.”

However, as more solutions like node and asyncio make asynchronous programming easily accessible, more developers will learn to use these powerful tools.

 

How Asyncio Solves the Problem

With asyncio, your code is run inside an event loop. This means that when an API request is awaiting a response, control is returned back to the event loop. Instead of sitting idling by, another block of code (that’s not awaiting IO) can be executed. This effectively means you waste no CPU cycles while you await IO processes. For requests with high latency and/or high response times, these savings can be massive.

 Event-Loop.png

There are other solutions like using processes or threads, but these are much harder to manually manage.

 

Migrating

So, should you switch to an asynchronous framework?

 xkcd.png

Source: https://xkcd.com/1691/


When it comes to migrating a codebase, that’s a complicated task of weighing pros/cons and is unique to your situation. If you’re suffering from the problems above, you’d probably already know it.

 

Greenfield Projects

However, if you’re starting a new project where the cost of switching to an asynchronous framework is mostly just learning some new syntax, you should definitely check it out. Both aiohttp and sanic are very easy to get started, they even look a lot like Flask. You can check out this aiohttp hello world example here and this sanic hello world example here. This blog post contains lots of helpful code snippets.

 

Negatives

This sounds too good to be true! What are the drawbacks?

  • Hard to use the interpreter (outside of event loop)
  • Must use exclusively async code
  • You have to think about the event loop
  • Minor difficulties: fewer testing tools, less documentation, etc

We dicussed some of the other downsides previously on this post, but these things are all manageable and you should be on your feet in no time.

 

Conclusion

Asynchronous programming can increase your server efficiency to save you money, and asyncio makes it easy to do. Happy asynchronous programming!

 


Interested in working with these technologies (and more)? Join us! 

Blockchain 101 - Elliptic Curve Cryptography

In this series of articles, I’m aiming to give you a solid foundation for blockchain development. In the last article, we gave an overview of the foundational math, specifically, finite fields and elliptic curves. In this article, my aim is to get you comfortable with elliptic curve cryptography (ECC, for short). This lesson builds upon the last one, so be sure to read that one first before continuing.

The Magic of Elliptic Curve Cryptography

Finite fields are one thing and elliptic curves another. We can combine them by defining an elliptic curve over a finite field. All the equations for an elliptic curve work over a finite field. By “work”, we mean that we can do the same addition, subtraction, multiplication and division as defined in a particular finite field and all the equations stay true. If this sounds confusing, it is. Abstract algebra is abstract!

Of course, the elliptic curve graphed over a finite field looks very different than an actual elliptic curve graphed over the Reals. An elliptic curve over real numbers looks like this:

Blockchain101-graphs-06.png

An elliptic curve over a finite field looks scattershot like this:

Blockchain101-graphs-10.png

How to calculate Elliptic Curves over Finite Fields

Let’s look at how this works. We can confirm that (73, 128) is on the curve y2=x3+7 over the finite field F137.

$ python2
>>> 128**2 % 137
81
>>> (73**3 + 7) % 137
81

The left side of the equation (y2) is handled exactly the same as  in a finite field. That is, we do field multiplication of y * y. The right side is done the same way and we get the same value.

Exercise

True or False: Point is on the y2=x3+7 curve over F223

1. (192, 105)
2. (17, 56)
3. (200, 119)
4. (1, 193)
5. (42, 99)

Highlight to reveal answers: 1. True, 2. True, 3. False, 4. True, 5. False


Group Law

The group law for an elliptic curve also works over a finite field:

Curve:y2=x3+ax+b
P1=(x1,y1)   P2=(x2,y2)   P1+P2=(x3,y3)
When x1≠x2:
s=(y2-y1)/(x2-x1)
x3=s2-x1-x2
y3=s(x1-x3)-y1
 


As discussed in the previous article, the above equation is used to find the third point that intersects the curve given two other points on the curve. In a finite field, this still holds true, though not as intuitively since the graph is a large scattershot. Essentially, all of these equations work in a finite field. Let’s see in an example:

Curve: y2=x3+7
Field: F137
P= (73, 128)   P= (46, 22)

Find P1+P2

First, we can confirm both points are on the curve:
1282% 137 = 81 = (733+7) % 137
222% 137 = 73 = (463+7) % 137

Now we apply the formula above:
s = (y2-y1)/(x2-x1) = (22-128)/(46-73) = 106/27
 

To get 1/27, we have to use field division as we learned last time.

Python:

>>> pow(27, 135, 137)
66
>>> (106*66) % 137
9

We get s=106/27=106*66 % 137=9. Now we can calculate the rest: 

x= s2-x1-x= 92-46-73 = 99
y= s(x1-x3)-y= 9(73-99)-128 = 49 

We can confirm that this is on the curve:
492% 137 = 72 = (993+7) % 137 

P1+P= (99, 49)

 

Exercise

Calculate the following on the curve: y2=x3+7 over F223 

1. (192, 105) + (17, 56)
2. (47, 71) + (117, 141)
3. (143, 98) + (76, 66)

Highlight to reveal answers: 1. (170, 142), 2. (60, 139), 3. (47, 71)

 

Using the Group Law

Given a point on the curve, G, we can create a nice finite group. A group, remember, is a set of numbers closed under a single operation that’s associative, commutative, invertible and has an identity. 

We produce this group, by adding the point to itself. We can call that point 2G. We can add G again to get 3G, 4G and so on. We do this until we get to some nG where nG=0. This set of points {0, G, 2G, 3G, 4G, … (n-1)G} is a mathematical group. 0, by the way, is the “point at infinity”. You get this point by adding (x,y) + (x,-y). Given that (x,y) is on the curve (x,-y) is on the curve since the left side of the elliptic curve equation has a y2. Adding these produces a point that’s got infinity for both x and y. This is what we call the identity. 

It turns out that calculating sG = P is pretty easy, but given G and P, it’s difficult to calculate s without checking every possible number from 1 to n-1. This is called the Discrete Log problem and it’s very hard to go backwards if n is really large. This s is what we call the secret key.

Because the field is finite, the group is also finite. What’s more, if we choose the elliptic curve and the prime number of the field carefully, we can also make the group have a large prime number of elements. Indeed, that’s what defines an elliptic curve for the purposes of elliptic curve cryptography.

 

Defining a Curve

Specifically, each ECC curve defines:

  • elliptic curve equation
    (usually defined as a and b in the equation y2 = x3 + ax + b)
  • p = Finite Field Prime Number
  • G = Generator point
  • n = prime number of points in the group

The curve used in Bitcoin is called secp256k1 and it has these parameters:

  • Equation y2 = x3 + 7 (a = 0, b = 7)
  • Prime Field (p) = 2256 - 232 - 977
  • Base point (G) = (79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798, 483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8)
  • Order (n) = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141

The curve’s name is secp256k1, where SEC stands for Standards for Efficient Cryptography and 256 is the number of bits in the prime field.

The big thing to note about this curve is that n is fairly close to p. That is, most points on the curve are in the group. This is not necessarily a property shared in other curves. As a result, we have something pretty close to 2256 possible secret keys.

 

How Big Is 2256?

Note that 2256 is a really large number. It’s around 1077, which is way more than the number of atoms in our galaxy (1057). It’s basically inconceivable to calculate all possible secret keys as there are simply too many of them. A trillion computers doing a trillion operations every picosecond (10-12 seconds) for a trillion years is still less than 1056 operations.

Human intuition breaks down when it comes to numbers this big, perhaps because until recently we’ve never had a reason to think like this; if you’re thinking that all you need is more/faster computers, the numbers above haven’t sunk in.

 

Working With Elliptic Curves

To begin working with elliptic curves, let’s confirm that the generator point (G) is on the curve (y2 = x3 + 7)

G = (79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798, 483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8)

p = 2256 - 232 - 977

y2 = x3 + 7

$ python2
>>> x = 0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
>>> y = 0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8
>>> p = 2**256 - 2**32 - 977
>>> y**2 % p == (x**3 + 7) % p
True

 

Remember, we’re always working in the Prime Field of p. This means that we always mod p for these operations.

Next, let’s confirm that G has order n. That is, nG = 1. This is going to require the use of a python library called pycoin. It has all of the secp256k1 curve parameters that we can check. Similar libraries exist for other languages. Note that the actual process is a bit more complicated and the reader is encouraged to explore the implementation for more details.

 

G = (79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798, 483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8)

n = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141

$ python2:
>>> n = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141
>>> from pycoin.ecdsa import generator_secp256k1 as g
>>> (n*g).pair()
(None, None)

 

(None, None) is actually the point at infinity, or the identity for point-addition.

 

Utilizing ECC for Public Key Cryptography

Private keys are the scalars, usually donated with “s” or some other lower case letter. The public key is the resulting point of the scalar multiplication or sG, which is usually denoted with “P”. P is actually a point on the curve and is thus two numbers, the x and y coordinate or (x,y).

Here’s how you can derive the public key from the private key:

Python:

>>> from pycoin.ecdsa import generator_secp256k1 as g
>>> secret = 999
>>> x, y = (secret*g).pair()
>>> print(hex(x), hex(y))
('0x9680241112d370b56da22eb535745d9e314380e568229e09f7241066003bc471L', '0xddac2d377f03c201ffa0419d6596d10327d6c70313bb492ff495f946285d8f38L')

 

Exercise

1. Get the public points for s in (7, 1485, 2128, 2240+231) in the secp256k1 curve.
2. Confirm the resulting points lie on the secp256k1 curve.

Highlight to reveal answers:

  1. (5CBDF0646E5DB4EAA398F365F2EA7A0E3D419B7E0330E39CE92BDDEDCAC4F9BC, 6AEBCA40BA255960A3178D6D861A54DBA813D0B813FDE7B5A5082628087264DA), (C982196A7466FBBBB0E27A940B6AF926C1A74D5AD07128C82824A11B5398AFDA, 7A91F9EAE64438AFB9CE6448A1C133DB2D8FB9254E4546B6F001637D50901F55), (8F68B9D2F63B5F339239C1AD981F162EE88C5678723EA3351B7B444C9EC4C0DA, 662A9F2DBA063986DE1D90C2B6BE215DBBEA2CFE95510BFDF23CBF79501FFF82), (9577FF57C8234558F293DF502CA4F09CBC65A6572C842B39B366F21717945116, 10B49C67FA9365AD7B90DAB070BE339A1DAF9052373EC30FFAE4F72D5E66D053)

 

SEC Format

The private keys are just 256 bit numbers, but the public keys are actually 2 different 256-bit numbers. This means that we need to serialize them. The same organization (Standards for Efficient Cryptography) created a format for this very purpose. There are two versions, compressed and uncompressed. Let’s start with the uncompressed version:

The first point from exercise 1 above is:

(x, y) = (5CBDF0646E5DB4EAA398F365F2EA7A0E3D419B7E0330E39CE92BDDEDCAC4F9BC, 6AEBCA40BA255960A3178D6D861A54DBA813D0B813FDE7B5A5082628087264DA)

 

In uncompressed SEC, we concatenate the byte “04”, then the X-coordinate and then the Y-coordinate. It looks something like this in hex:

045CBDF0646E5DB4EAA398F365F2EA7A0E3D419B7E0330E39CE92BDDEDCAC4F9BC6AEBCA40BA255960A3178D6D861A54DBA813D0B813FDE7B5A5082628087264DA

 

Because the x and y coordinates are 32-bytes (256 bits) each, the length of an uncompressed SEC format public key is 65 bytes.

It turns out this is a little bit inefficient. If we know the x coordinate, there are only two possible y-coordinates, the positive and negative (odd and even in a finite field). Thus, they came up with a compressed SEC format. The first byte is “02” if y is even, “03” if y is odd. Then we concatenate the x-coordinate. The above point in Compressed SEC format is this:

025CBDF0646E5DB4EAA398F365F2EA7A0E3D419B7E0330E39CE92BDDEDCAC4F9BC

This is because the y-coordinate ends in A, which is even in hex. Note that compressed keys are always 33 bytes (1 byte + 32 byte x-coordinate)

Exercise

Find the compressed and uncompressed SEC format for the public keys where the secret key is:

1. 9993
2. 123
3. 42424242 

Highlight to reveal answers:

  1. 049D5CA49670CBE4C3BFA84C96A8C87DF086C6EA6A24BA6B809C9DE234496808D56FA15CC7F3D38CDA98DEE2419F415B7513DDE1301F8643CD9245AEA7F3F911F9, 039D5CA49670CBE4C3BFA84C96A8C87DF086C6EA6A24BA6B809C9DE234496808D5

  2. 04A598A8030DA6D86C6BC7F2F5144EA549D28211EA58FAA70EBF4C1E665C1FE9B5204B5D6F84822C307E4B4A7140737AEC23FC63B65B35F86A10026DBD2D864E6B, 03A598A8030DA6D86C6BC7F2F5144EA549D28211EA58FAA70EBF4C1E665C1FE9B5

  3. 04AEE2E7D843F7430097859E2BC603ABCC3274FF8169C1A469FEE0F20614066F8E21EC53F40EFAC47AC1C5211B2123527E0E9B57EDE790C4DA1E72C91FB7DA54A3, 03AEE2E7D843F7430097859E2BC603ABCC3274FF8169C1A469FEE0F20614066F8E

 

Conclusion

In this lesson, we learned how to combine finite fields and elliptic curves to create a finite group for use in public key cryptography. Next time, we’ll show how to convert SEC format public keys to Bitcoin Addresses and how we can sign and verify messages using the math learned here.

 

Interested in working with these technologies (and more)? Join us! 

Why aren’t distributed systems engineers working on blockchain technology?

For the last 15 years, I have predominantly worked on Distributed Systems of various size and complexity. Over the last couple of years, I have become more interested in blockchain or Distributed Ledger Technology (DLT). Coming from a Distributed Systems background, it seemed natural to me to understand more about this innovative technology. So it’s surprising that, although there has been a lot of buzz around the crypto community about DLT, the Distributed Systems community has largely remained away from the DLT buzz. A few reasons I can think of for this distance:

  1. Distributed Systems engineers are usually dealing with extremely large amounts of data. In comparison 3 txs/second on the Bitcoin blockchain seems very miniscule.

  2. BFT consensus algorithms are not widely used(or popular) in the industry. I will touch upon this later in the blog.

  3. The blockchain community is immature and historically unwelcoming.

  4. It feels like a get rich quick scheme - https://twitter.com/naval/status/878018839044161536

In this article, I will explore why DLT is an extremely interesting technology for Distributed Systems engineers.

Note: Throughout this post, I will be using the terms “blockchain” and “distributed ledger technology (DLT)” interchangeably.


Commonalities between DLT & Distributed Systems

A lot of design principles used in DLT are also common design patterns in Distributed Systems:

icon-immutable

Immutability

Blockchains are immutable. And for quite some time, distributed systems have relied on immutability to eliminate anomalies. Log-structured file system, log-structured merge-trees, and Copy-On-Write are common patterns/tricks used in Distributed Systems to model immutable data structures. Blockchains handle transactions in a similar way to event sourcing, the common technique used in Distributed Computing to handle facts and actions. Instead of overwriting data, you create an append-only log of all facts/actions that ever happened.

Pat Helland described the importance of immutability in his popular paper Immutability Changes Everything:

Accountants don't use erasers; otherwise they may go to jail. All entries in a ledger remain in the ledger. Corrections can be made but only by making new entries in the ledger. When a company's quarterly results are published, they include small corrections to the previous quarter. Small fixes are OK. They are append-only, too.

Blockchains are simply distributed accounting ledgers, hence the name Distributed Ledger Technology.

 

Asynchrony

Asynchrony

Blockchains are designed to run on commodity machines that may be thousands of miles apart. Arriving at a common history of transaction order in this kind of asynchronous network is the classic Distributed Computing problem that Distributed Systems Engineers deal with. All the impossibility results in Distributed Systems like FLP and CAP apply to blockchain since it is also a Distributed System.

Much like in Distributed Systems, in blockchains there is no “now". Clocks of different nodes in a distributed system can drift apart. Thus it is not at all trivial to reason about the global real time ordering of events across all the machines in the cluster. Machine local timestamps will no longer help here since the clocks are no longer assumed to be always in sync. In addition to this, messages can be delayed for arbitrary period of times. The time lag between writing a packet onto the wire to receiving it on the other end could be on the order of milliseconds, seconds, minutes or even days. For the bitcoin blockchain, Satoshi devised a clever way of ordering transactions to prevent the problem of double spending, in the absence of a global clock using a Distributed Timestamp Server. Quoting Satoshi from his Bitcoin whitepaper:

The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it.

This is similar to a DBMS (database management system) in which transaction logs record all writes to the database. To that extent, blockchain is essentially a Distributed Transaction Log.


consensus

Consensus

In the absence of a globally synchronized clock, the only way to decide on the order of transactions is through Distributed Consensus. Coming to consensus on the ordering of events/transactions across distributed machines can be thought of as a logical surrogate for “now”. But coming to consensus in a Distributed System is hard:

The FLP impossibility result states that in an asynchronous network where messages may be delayed but not lost, there is no consensus algorithm that is guaranteed to terminate in every execution for all starting conditions, if at least one node may fail-stop.

Crash Fault Tolerant consensus algorithms like Paxos, Zab, Raft, Viewstamped Replication are all too common in Distributed Systems literature and every major Distributed database or filesystem out there is using one (or a variant) of these algorithms. These crash fault tolerant algorithms are modeled to handle consensus in scenarios where processes or machines can crash or cause delays in message delivery. The aforementioned algorithms are often used in Distributed Systems that are run by one organization.

Blockchains work in much more adversarial conditions and are modeled to handle failures known as the Byzantine Generals’ Problem, where some of the nodes could be malicious. This is possible since the nodes are run by different entities/organizations that do not trust each other. Blockchains are modeled under the assumption that your own network is out to get you. Hence you need Byzantine Fault Tolerant algorithms to achieve consensus in blockchains. Byzantine Fault Tolerance has been studied in Distributed Systems literature for a long time. In 1999, Miguel Castro and Barbara Liskov introduced the Practical Byzantine Fault Tolerance (PBFT) algorithm, which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency. Although the paper was written in 1999, there were no known practical implementations of BFT algorithms until Bitcoin came along in 2008 with the Proof Of Work algorithm; although the algorithm itself is old and was used in other systems such as Hashcash to limit email spam. The Blockchain has created a renewed interest in BFT algorithms and a whole slew of new BFT algorithms are being researched very actively in academic circles. Some examples include Proof Of Stake, Bitcoin-NG, Tendermint and Honey Badger.


network reliability

Network reliability

Contrary to popular beliefs, the network is not reliable. Distributed Systems engineers have to deal with this cold hard fact. Bitcoin and other crypto currencies were built to work over the internet, where network partitions and message loss/reordering are common. Interestingly the blockchain data structure itself is a clever way to detect message loss and reordering. Every block has a pointer back to the previous block similar to a linked list which makes it easy to detect missing blocks( “orphaned blocks” in blockchain lingo). Quoting Satoshi again:

New transaction broadcasts do not necessarily need to reach all nodes. As long as they reach many nodes, they will get into a block before long. Block broadcasts are also tolerant of dropped messages. If a node does not receive a block, it will request it when it receives the next block and realizes it missed one.

This is similar in principle to replicating the transaction log or log shipping which is a common technique used to keep replicas(especially read-only replicas) synchronized. As transaction logs are ordered, they provide an easy mechanism to detect gaps and repair replicas. Similarly, the integrity of every block in a blockchain can be validated by checking the merkle root of the block which is a hash of all the transactions in the block. So it’s easy to detect a missing transaction. As a reminder, merkle trees are a very common technique used in anti-entropy repair for replica synchronization.


Blockchains are an exciting technological breakthrough. For the first time ever, you can have a distributed database across entities that do not trust each other. We are still in early days of this interesting technology. In many ways, it’s similar to writing the first distributed NoSQL database like Amazon’s Dynamo or Google’s BigTable. These distributed databases showed us a new way to build large scale databases and opened our eyes to new design patterns and data structures. NoSQL databases have now become commoditized. If you hear about a new NoSQL database, 90% of the patterns and algorithms used are the same. DLTs are going through a similar phase, and they will become commoditized eventually. But these are still early days and we are discovering the best patterns for building them.

The blockchain community has attracted a number of folks from the crypto community, but has relatively few Distributed Systems Engineers. I wrote this post to give a brief overview of the challenges facing DLT engineers and would like to call upon all the Distributed Systems Engineers to get excited about DLTs and start contributing. If you would like to be part of building the next generation DLT and invent the new algorithms and patterns that will be used for years to come, join us.

Interested in working with these technologies (and more)? Join us! 

Blockchain 101 - Foundational Math

Getting into blockchain development can be pretty intimidating. There’s a whole host of weird terms that are thrown around like “coinbase” and “merkle root” that not only look odd, but are not obvious. Add to that additional terms that look normal, but mean something specific to blockchain like “transaction”, “block” and “signature”.

In this series of articles, I aim to give you a gentle introduction into the world of blockchain development, or what I call “Blockchain 101". It sounds like a university course and it’s meant to be about at that level. You can read the posts here and get something out of them, but to get the full benefit, take time to do the exercises so you can really get an understanding of what’s going on.

In today’s article, we’re going to start on the foundational math. Our goal is to get you comfortable with finite fields and elliptical fields, which is a prerequisite to elliptical curve cryptography - the math behind blockchain.


Finite Fields

Finite fields are an important construct in mathematics and especially important in computer science. Think of them as a set of integers. When you take any two numbers from this set and add them, you get another number from the set. Same thing with subtraction, multiplication and division by anything other than 0. It may seem strange that you can divide two integers and come up with another integer, but that’s part of the magic, especially when you have a finite set.

So let’s start with a fairly simple field:

F19 = {0, 1, 2, … 18}

As you can see, there are exactly 19 elements in this field that consists of every number between 0 and 18, inclusive. For various mathematical reasons, it turns out fields that have a prime number of elements are the most interesting, which is why we chose 19 here. The field of order 19 is denoted F19. Fields of other orders are denoted with the number as the subscript.


Addition and Subtraction

We now have to define addition and subtraction so that adding any two elements or subtracting any two elements results in another element in this set. We can do this with modulo arithmetic using the order of the prime field (19).

F19

11 + 6 = 17 % 19 = 17

17 - 6 = 11 % 19 = 11

8 + 14 = 22 % 19 = 3

4 - 12 = -8 % 19 = 11

As you can see, no matter which two elements we select from F19 we will always end up with another element in F19.

Exercise

Solve these equations in F31:

2 + 15 = ?

29 - 4 = ?

17 + 21 = ?

15 - 30 = ?

Answers: 17, 25, 7, 16


Multiplication

We can also define multiplication on a finite field using modulo arithmetic:

F19

2 * 4 = 8 % 19 = 8

7 * 3 = 21 % 19 = 2

15 * 4 = 60 % 19 = 3

11 * 11 = 121 % 19 = 7

Again, we can multiply any two numbers from F19 and end up with another number in F19. Note we end up with some counter-intuitive results, like 7*3=2 and 11*11=7. Don’t worry too much about that, as finite field math can be a bit counter-intuitive, coming from the world of normal multiplication.


Exercise

1. Solve these equations in F31:

  24 * 19 = ?  173 = ?  55 * 18 = ?

2. Write a program to calculate 1*k, 2*k, 3*k, … 30*k for some k in F31. Notice anything about these sets?

3. Write a program to calculate k30 for all k in F31. Notice anything?

Answers: 1. 22, 15, 16; 2. The sets are all equal to F31; 3. k30 is always 1.


One quick note before we leave multiplication. The answer to #2 above is why we choose prime fields. The numbers in a prime field are basically equivalent whereas in a composite field, the set 1*k, 2*k, 3*k…(n-1)*k wouldn’t necessarily be equivalent to the field itself.

 

Division

Division, in finite field mathematics, is simply defined as the inverse of multiplication.

F19

2 * 4 = 8  => 8 / 4 = 2

7 * 3 = 2  => 2 / 3 = 7

15 * 4 = 3 => 3 / 4 = 15

11 * 11 = 7 => 7 / 11 = 11

As with multiplication, we have some really counter-intuitive results like ⅔ = 7, ¾=15 and 7/11=11. Once again, finite fields are somewhat counter-intuitive so please don’t worry about the weirdness of the results, but just how to calculate them. The more salient question is, how do we do the actual calculation?


Problem 3 of the last exercise gives us a clue. Generalizing a bit, we have:

np-1=1 mod p where p is prime

This is a beautiful result from number theory known as Fermat’s Little Theorem. If you care to convince yourself this is true, there’s a very interesting proof using necklaces in the Wikipedia article.


In any case, this implies:

np-2=n-1=1/n mod p where p is prime

This can help us learn how to do field division.

F19

np-2 = n-1

2 / 3 = 2 * 3-1 = 2 * 317 = 7

3 / 15 = 3 * 15-1 = 3 * 1517 = 4

Notice that we end up with the same result as before for ⅔. Incidentally, if you’re using python, there’s a very convenient function called “pow” which does modulo exponentiation:

Python: n-1 = pow(n, p-2, p)

 

Exercise

1. Solve these equations in F31:

3 / 24 = ?  17-3 = ?  4-4 * 11 = ?

2. Write a program to calculate k/1, k/2, k/3, … k/30 for some k in F31. Notice anything about these sets?

Answers: 1. 4, 29, 13; 2. The sets are all equal to F31

 

Elliptical Curves

To introduce elliptical curves, we’re going to start by looking at various equations from high school algebra. Let’s take a look first at a linear equation: y=mx+b

01-Blockchain101-graphs-01.png

A slightly more complicated equation is the quadratic equation: y=ax2+bx+c

Blockchain101-graphs-02.png

And finally, you have the cubic equation which looks like this: y=ax3+bx2+cx+d

Blockchain101-graphs-03.png

An elliptical curve, to be quite frank, isn’t that different. The equation looks like this: y2=x3+ax+b

Blockchain101-graphs-04.pngBlockchain101-graphs-05.png

 

Notice that it’s less steep than the cubic or quadratic curves. Notice also that it’s a mirrored over the x-axis. This is because the left side of the equation is y2, making it so that if y is a solution -y is also a solution.

Bitcoin uses what’s called the secp256k1 curve, and that equation is simply y2=x3+7 and looks like this:

Blockchain101-graphs-06.png

Now elliptical curves have an interesting property that we can define as point addition.

Blockchain101-graphs-07.png

Point addition stems from the fact that a line defined by two points on this curve will intersect the curve a third time. Essentially, the operation is as described in the diagram above. To add P and Q, you find the third intersection point, then you reflect over the x-axis.


Here’s a solid example. Suppose the curve and points that we want to look at are these:

Curve: y2=x3+5x+7

P1=(2,5) P2=(3,7)

We can clearly see that P1 and P2 are on the curve:

52=25=23+5*2+7

72=49=33+5*3+7

We can calculate the line that P1 and P2 makes my calculating the slope and y-intercept:

y=mx+b => m=(y2-y1)/(x2-x1)=2

5=2*2+b => b=1 =>

y = 2x + 1

We can now plug in the resulting line into the elliptical curve equation to solve for x:

(2x+1)2 = x3+5x+7 => x3-4x2+x+6 = 0

(x,y) = (-1,-1), (2,5), (3,7)

We know (2,5) and (3,7) are P1 and P2, which means that (-1, -1) is the third point. We now flip that point over the x-axis to get the result:

P1 + P2 = (-1, 1)

 

Group Law

This process of figuring out the third point is a bit easier to codify this way:

Curve:y2=x3+ax+b

P1=(x1,y1) P2=(x2,y2) P1+P2=(x3,y3)

When x1≠x2:

s=(y2-y1)/(x2-x1)

x3=s2-x1-x2

y3=s(x1-x3)-y1

It turns out that this process gives us something called a group. A group in mathematics is a set of numbers and a single operation that’s closed, associative, commutative, invertible and has an identity. When we use point addition on all the points of an elliptical curve, we get a group. Note that for a line that’s perfectly vertical, there’s another point called the “point at infinity”, which also serves as the identity for the group. Commutativity should be fairly obvious as two points make a line and it doesn’t matter which order you choose them. Associativity can be seen here:

Blockchain101-graphs-08.png

01-Blockchain101-graphs-09.png

Exercise

For the curve y2 = x3 + 5x + 7, and the following points:

P1=(3,7), P2=(18,77)

Calculate (using the group laws):

P1+P2=?

Verify this point is on the curve.

Answer: P1+P2=(7/9, 91/27)≈(0.7778, 3.3703)


Conclusion

We’ve covered Finite Fields and Elliptical Curves. They may seem unrelated, but next time, we’ll combine them to produce a very powerful cryptographic tool called Elliptical Curve Cryptography.

Interested in working with these technologies (and more)? Join us! 

Python 3's Killer Feature: asyncio

A massive debate in the python community about python2/3 has been raging for years. The main reason for python3 to diverge was to provide unambiguous types to handle unicode, strings and bytes (more here), but recently there’s been a bigger divergence that’s gone largely unnoticed. python3.4 introduced the asyncio module and python3.5 gave it a new syntax that is built into the language. Asyncio has done for python what node did for javascript. Asynchronous programming in python enables you to do new and powerful things, but like all technologies there are tradeoffs.

 

Performance

We use uvloop to speed up our asyncio code (link):

“uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.”

Asyncio can significantly improve throughput on the same hardware. You don’t have to use semaphores, you get access to shared memory, and it’s relatively easy to code. However this won’t improve latency since await-ed functions still have to wait for IO. While our await-ed functions wait for IO, control is automatically returned to the event loop so other code can run.

For applications with lots of IO, the savings can be substantial. Imagine a service like pingdom that visits a ton of websites to be sure they’re still up. Serially making HTTP requests and waiting for responses would be painfully slow. Before asyncio, you had to use python’s threading module, which has several limitations.

 

Example Code

Let’s start, by showing what some asynchronous python 3.5+ code looks like:

 

Regular Snippet 

async def get_popular_words(words_to_ignore, histogram_size):

   # make asynchronous call
   data = await slow_function_that_waits_for_io()

   # calculate histogram for data
   histogram = {}
   for word in data.split():
       if word not in words_to_ignore:
           histogram[word] = histogram.get(word, 0) + 1

   # return only top `histogram_size` most common words (and frequency)
   to_return = {}
   for cnt, word in enumerate(sorted(histogram, key=histogram.get, reverse=True)):
       to_return[word] = histogram[word]
       if cnt = histogram_size-1:
           break

   return to_return

 

At first glance, it looks strange (you might say unpythonic!) and is a bit harder to wrap your head around. But actually, it’s just a bunch of normal looking python code with two differences. One is an async in front of the get_histogram method definition, and another is an await in front of the asynchronous slow_function_that_waits_for_io function call.

What’s messier is how to run this code:

Running Your Async Code 

import asyncio
from my_module import import get_popular_words
if __name__ == '__main__':
   coroutine = get_popular_words(
       words_to_ignore=set(['the', 'and', 'to']),
       histogram_size=5,
   )
   popular = asyncio.get_event_loop().run_until_complete(coroutine)
   print(popular)

 

If you’re using a framework like aiohttp or sanic you’ll just follow the default instructions and not have to think about this much. If you dig into the details you’ll have to learn all about coroutines and what’s happening under the hood.

 

Negatives

While this performance sounds great, what does it cost? We’ve found programming with asyncio to have some downsides.

 

Hard to Use the Interpreter

One of the nice things about python is that you can always fire up the interpreter and step through code easily. For asyncio code, you need to run your code using an event loop. In practice, this usually means adding a bunch of print statements to whatever isn’t working and re-running your code.

 

All or Nothing

It’s hard to just use asyncio for the few method(s) that would most benefit from the performance, since you need to run your code with an event loop. You’ll have to jump into the deep end!

In python3’s early days, there was valid concern about library support. Today, python3 has overtaken python2 in library support. However, asyncio is trickier. For example, the popular requests library has chosen to not be asyncio compatible (note: aiohttp is a great asyncio compatible alternative). You’ll probably need a new database adaptor.

 

You Have to Think about the Event Loop

Are you awaiting the result of several actions and then performing some action on that data? If so, you should be using asyncio.gather. Or, perhaps you want to asyncio.wait on future completion? Do you want your future to run_until_complete or run_forever?

Did you forget to put async in front of a function definition that uses await? That will throw an error. Did you forget to put await in front of an asynchronous function? That will return a coroutine that was never invoked, and intentionally not throw an error!

asyncio doesn’t play that nicely with threads, so if you’re using a library that depends on threading (like Cassandra’s python driver) you’re going to have to come up with a workaround.

 

Everything is Harder

Libraries are often buggier and less used/documented (though hopefully that will change over time). You’re not going to find as many helpful error messages on Stack Overflow. Something as easy as writing asynchronous unit tests is non-trivial.

There’s also more opportunities to mess up. For example, when we moved one app to asyncio we forgot to move over a few synchronous API calls that were using the requests library. Since we were running this app on a single thread, those ~100ms API calls were making the whole app hang! It took a while to figure out why.

 

When should I use asyncio?

It depends! With more python3 library support than python2, there’s no good reason to build a new app using python2. It makes sense to use asyncio if your app wastes a lot of cycles waiting on IO, is a good fit for an asynchronous framework (especially websockets), and resource intensive (reduce your server bill).

 

 

This blog post was inspired by Jimmy Song’s recent talk at the NY Python Meetup. Check it out for a more in-depth discussion of asyncio.


Interested in working with these technologies (and more)? Join us! 

Why Take-Home Tests Are Awesome

Written by Raj Nair, VP of Engineering, with contributions from Michael Flaxman, Principal Engineer

 

At Paxos, we give all software engineering candidates a short take-home test before deciding if we want to move forward with an in-person interview. We’ve found it works really well, so we want to share our process and why we think it’s so effective.

Recruiting-Steps.v2-01-01.png
Our hiring process in one easy flow: phone interview, take-home test, in-person interview, and offer!

 

We don’t just give out take-home tests for our own benefit; it saves candidates time as well. Our offer rate after in-person interviews is much higher than if we gave interviews to everyone who passed a basic phone screen; we’re often able to confidently give candidates a yes-or-no answer after only a few hours of in-person interviews. The take-home test eliminates the need to do marathon 6-hour interview days or multiple rounds of interviews.

The biggest advantage of the take-home test, however, is that candidates have the opportunity to showcase their best work under more realistic working scenarios. Our take-home test can be completed on the candidate’s schedule, which is especially useful if their full-time job keeps them busy during work hours. More importantly, we know that writing code live can be nerve-wracking, and it can be hard to do a great job under pressure. If someone aces an in-person code test they’re probably very talented, but conversely if they do a mediocre job, does that mean their skills are mediocre? Perhaps the question was explained in a confusing way, the candidate was thrown off by working on a foreign development environment, exhausted from a day of interviews, tired from a bad night of sleep, freezes on writing code on a whiteboard. The list goes on.

In a competitive market for software engineers, we can’t afford to only hire candidates with perfect in-person interviews. Human communication is a lossy protocol!

When we grade a submission, we’re evaluating it like we would a pull request on our codebase; we expect it to be great! We always make it clear to candidates that we want them to show off their best work. Two engineers grade each submission to ensure we are both thorough and fair. Here are some common things that tend to cause poor code reviews:

 

PerformanceIs your code performant?

Every question we ask has ideal, acceptable and bad performance. One benefit of the take-home is that you have plenty of time to think about performance and revise your work. There’s no excuse for an O(n2) answer to an O(n) problem. Bonus points go to a candidate who explains the performance of their code and how they could improve it if needed.

 

engineering iconREC-09.pngDid you give yourself a code review before you submitted your code?

Since code is generally read many more times than it is written, we expect all code at Paxos to be self-reviewed before submission. Often times, candidates try to solve a problem one way and then change their approach along the way. That’s totally normal, but it’s important to go back and clean up the code so that it makes the most sense given how you ultimately solved the problem. Leaving in unused variables, commented out code and print statements for debugging demonstrates a lack of caring for your craft.

 

engineering iconREC-10.pngDid you use the right tool for each problem?

There’s often multiple correct ways to do something, but you should pick your tools carefully and then use them in the way that best demonstrates their utility.

 

engineering iconREC-11.pngDoes your code flow logically?

This one is hard to define, but you know it when you see it. Generally, shorter code is better code, but this heuristic has a limit. We sometimes get submissions that contain 100 files to solve a problem that can be solved in 30 lines of code (the former may be acceptable if well documented).

 

engineering iconREC-12.pngAre variables and methods well named?

It’s one of the three hard problems in computer science.

 

engineering iconREC-13.pngDid you include clear and concise documentation?

We accept submissions in any language, so wherever there is ambiguity you should be clear about versions, dependencies, platforms and other configuration instructions.

 

engineering iconREC-14.pngDid you include a working example and/or test coverage?

If your submission is perfect, this may not be strictly necessary. But if you made any mistakes, it’s easy to forgive a minor error if you can easily see what the candidate was trying to do.

 

engineering iconREC-16.png

Is your use of commenting appropriate?

Unclear parts of your code should be well commented (or better yet, made clear enough so that they don’t need comments), and obvious parts of your code should not be over-commented.

 

engineering iconREC-15.png

DID YOU FOLLOW THE INSTRUCTIONS?

Each question we ask includes example inputs/outputs. It’s amazing how many submissions we get that don’t meet the spec.



Sometimes candidates do an excellent job on their take-home, but when they come in for an in-person interview they're unable to solve a simple fizzbuzz problem. To combat cases of cheating (most commonly, getting help from friends), we do two things:

  1. Before flying candidates in from other cities, we ask them to solve a simple live-coding question on a Google Hangout. This won’t tell us that they’re extremely talented, but it guarantees a minimal ability level.

  2. During their interview, we ask candidates about their code. If they’re unclear how it works, it’s a major red flag. We sometimes ask them to improve their code on the spot or to build on it to solve other problems. We think it’s fair game to ask more challenging in-depth questions since the candidate has previously had time to think through the problem.


We care about many things when evaluating a candidate, but the most important thing we’re testing in the interview process is work product. There is no better approximation than doing a test that simulates real work, especially when it’s on your own time and in the comfort of your own space. Our problems are designed to simulate the types of challenges you’d be working on at Paxos so we can see how you would think through them.

The strongest objection we get from candidates is that they’re too busy at work and don’t have time for this. Our take-home is short, so that shouldn’t hold you back. We currently ask three questions, each are solved on average in 50 lines of code (some can be done in much fewer!). They’re designed to be fun problems, so if you enjoy writing code you should enjoy the challenge. We’ve been iterating for 9 months on our choice of questions, wording and examples, so we think it’s pretty good.

Though take-home tests aren’t a silver bullet, they’ve been really helpful for us at getting a fuller picture of a candidate’s abilities while saving everyone time in the process. Candidates seem to enjoy the questions, too!


Want to try your hand at our take-home test? Join us!  

Write Fast Apps Using Async Python 3.6 and Redis

One of the common complaints people have about python and other popular interpreted languages (Ruby, JavaScript, PHP, Perl, etc) is that they’re slow. A faster language would give better response times and reduce server costs.

python3.4 introduced the asyncio module and python3.5 gave it a new syntax that is built into the language. Then, along came uvloop with a huge speed increase:


uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.


At Paxos, we make heavy use of asyncio because it’s more performant and also because it’s a better fit for our architecture. Our product synchronizes transaction data from many different clients and is inherently asynchronous because it’s a distributed system.

asyncio is great for maximizing throughput, but we also care about reducing latency. For latency, your database can be a major bottleneck. When using best practices like a proper index and SSDs, a ~10 ms response time for a database query is considered good.

RAM is orders of magnitude faster, but simple key-value stores don’t allow for many of the features software engineers rely on. Multiple indices and basic type checking are very useful. ORDER BY, LIMIT, and OFFSET, are nice to have as well.

We’ve been impressed with redis due to its performance, advanced features and documentation. So, we wrote a redis-backed object mapper called subconscious that relies on asyncio and redis’ powerful primitives. You get the benefits of a database, with the performance of RAM! It’s like a bare-bones version of SQLAlchemy that will never support JOIN operations.

You can define a model like this (for more details, check out the official repo):

class Gender(enum.Enum):
   MALE = 'male'
   FEMALE = 'female'

class User(RedisModel):
   uuid = Column(primary_key=True)
   name = Column(required=True)
   age = Column(index=True, type=int, sort=True, required=True)
   gender = Column(index=True, enum=Gender)
   country_code = Column(index=True)

 

Then you can query and use your model like this:

some_user = await User.load(db, some_uuid)
if some_user.age > 65:
   # apply senior citizen discount
   price -= 10.00

 

You’ll also find lots of features you’ve come to expect from traditional ORMs. Here’s how you’d send a promotion to your US, college-age, male users:

async for user in User.filter_by(
   db=db,
   age=[18, 19, 20, 21, 22],
   country_code='USA',
   gender=Gender.MALE,
):
   await send_promo_email(user)

 

In order to demonstrate how powerful and easy to use this is in practice, we’ve thrown together a pastebin-like website in ~100 lines of code:

http://52.170.82.45/ (source code here)
 

GIF-pastey1.gif
GIF-pastey2.gif

We load-tested this app on Microsoft Azure’s cheapest instance (called an F1, which has 1 core and 2 GB RAM), and the numbers are great:

  • 10k pageviews took ~41s
  • Median pageview was ~19ms
  • 90% of pageviews return in < 22ms

Redis can be saved to disk, but has much weaker persistence vs a traditional database. Transactions and rollbacks are complicated, and there are less guarantees when it comes to data integrity. Some common use-cases:

  • Ephemeral data that can be replayed from a log (so persistence isn’t a big concern)
  • Frequently-updated data: writing to RAM is *much* faster than writing to disk
  • When latency is extremely important
  • When the same data is queried regularly, the volume of data is low, and/or you have a large hardware budget

For low-throughput applications or applications that lend themselves to horizontal scaling, these features may not provide enough benefit to get excited about. But, for high-throughput apps where latency matters, the performance increase can be substantial.

Subconscious is open-sourced using an MIT license, so feel free to use it for your own projects!


Interested in working with these technologies (and more)? Join us! 

dsert - Dictionary Assertion Library

 

I spoke at a recent New York Python Meetup Group event hosted by Stack Overflow that focused on a library we open-sourced called dsert.  At Paxos, we use it for API testing, and you might want to as well. Using dsert we can test all the fields of a python dictionary, even when we don't know all of their values. This ensures that tests are as explicit as possible. You can watch the full video below, or on our Paxos YouTube page.

 

Latest Posts Delivered to your Inbox