Envoy Proxy with Microservices

Introduction

I came across Envoy proxy for the first time a couple weeks ago, when one of my blog readers suggested me to write an article about it. I had never heard about it before and my first thought was that it is not my area of experience. In fact, this tool is not as popular as its competition like nginx or haproxy, but it provides some interesting features among which we can distinguish out-of-the-box support for MongoDB, Amazon RDS, flexibility around discovery and load balancing or generating a lot of useful traffic statistics. Ok, we know a little about its advantages but what exactly is Envoy proxy? ‘Envoy is an open source edge and service proxy, designed for cloud-native applications’. It was originally developed by Lift as a high performance C++ distributed proxy designed for standalone services and applications, as well as for large microservices service mesh. It sounds really good right now. That’s why I decided to take a closer look on it and prepare a sample of service discovery and distributed tracing realized with Envoy and microservices based on Spring Boot.

Envoy Configuration

In the most of previous samples basing on Spring Cloud we have used Zuul as edge and proxy. Zuul is popular Netflix OSS tool acting as API Gateway in your microservices architecture. As it turns out, it can be successfully replaced by Envoy proxy. One of the things I really like in Envoy is the way to create configuration. The default format is JSON and is validated against JSON schema. This JSON properties and schema are documented well and can be easily understood. Just what you’d expect from modern solution the recomended way to get started with it is by using the pre-built Docker images. So, in the beginning we have to create Dockerfile for bulding Docker image with Envoy and provide configuration file in JSON format. Here’s my Dockerfile. Parameters service-cluster and service-node are optional and has to do with provided configuration for service discovery, which I’ll say more about in a minute.

FROM lyft/envoy:latest
RUN apt-get update
COPY envoy.json /etc/envoy.json
CMD /usr/local/bin/envoy -c /etc/envoy.json --service-cluster samplecluster --service-node sample1

I assume you have a basic knowledge about Docker and its commands, which is mandatory at this point. After providing envoy.json configuration file we can proceed with building Docker image.

docker build -t envoy:v1 .

Then just run it using docker run command. Useful ports should be exposes outside.

docker run -d --name envoy -p 9901:9901 -p 10000:10000 envoy:v1

The first pretty helpful feature is local HTTP administrator server. It can be configured in JSON file inside admin property. For the example purpose I selected port 9901 and as you probably noticed I also had exposed that port outside Envoy Docker container. Now, admin console is available under http://192.168.99.100:9901/. If you invoke that address it prints all available commands. For me the most helpful were stats, which print all important statistics related with proxy and logging, where I could changed logging level dynamically for some of defined categories. So, first if you had any problems with Envoy try to change logging level by calling /logging?name=level and watch them on Docker container after running docker logs envoy command.

"admin": {
    "access_log_path": "/tmp/admin_access.log",
    "address": "tcp://0.0.0.0:9901"
}

The next required configuration property is listeners. There we define routing settings and the address on which Envoy will listen for incoming TCP connection. The notation tcp://0.0.0.0:10000 is the wild card match for any IPv4 address with port 10000. This port is also exposed outside Envoy Docker container. In this case it will therefore be our API gateway available under http://192.168.99.100:10000/ address. We will come back to the proxy configuration details at a ltare stage and now let’s take a closer look on the architecture of presented example.

"listeners": [{
    "address": "tcp://0.0.0.0:10000",
    ...
}]

Architecture

The architecture of described solution is visible on the figure below. We have Envoy proxy as API Gateway, which is an entry point to our system. Envoy integrates with Zipkin and sends there tracing messages with information about incoming HTTP requests and responses sent back. Two sample microservices Person and Product register itself in service discovery on startup and deregister on shutdown. They are hidden from external clients behind API Gateway . Envoy has to fetch actual configuration with addresses of registered services and route incoming HTTP request properly. If there are multiple instances of each service available it should perform load balancing.

envoy-arch

As it turns out Envoy does not support well known discovery servers like Consul or Zookeeper, but defines its own generic REST based API, which needs to be implemented to enable cluster members fetching. The main method of this API is GET /v1/registration/:service used for fetching the list of currently registered instances of service. Lyft’s provides its default implementation in Python, but for the example purpose we develope our own solution using Java and Spring Boot. Sample application source code is available on GitHub. In addition to service discovery implementation you would also find there two sample microservices.

Service Discovery

Our custom discovery implementation does nothing more than exposing REST based API with methods for registration, unregistration and fetching service’s instances. GET method needs to return specific JSON structure which matches the following schema.

{
    "hosts": [{
        "ip_address": "...",
        "port": "...",
        ...
    }]
}

Here’s REST controller class with discovery API implementation.

@RestController
public class EnvoyDiscoveryController {

    private static final Logger LOGGER = LoggerFactory.getLogger(EnvoyDiscoveryController.class);

    private Map<String, List<DiscoveryHost>> hosts = new HashMap<>();

    @GetMapping(value = "/v1/registration/{serviceName}")
    public DiscoveryHosts getHostsByServiceName(@PathVariable("serviceName") String serviceName) {
        LOGGER.info("getHostsByServiceName: service={}", serviceName);
        DiscoveryHosts hostsList = new DiscoveryHosts();
        hostsList.setHosts(hosts.get(serviceName));
        LOGGER.info("getHostsByServiceName: hosts={}", hostsList);
        return hostsList;
    }

    @PostMapping("/v1/registration/{serviceName}")
    public void addHost(@PathVariable("serviceName") String serviceName, @RequestBody DiscoveryHost host) {
        LOGGER.info("addHost: service={}, body={}", serviceName, host);
        List<DiscoveryHost> tmp = hosts.get(serviceName);
        if (tmp == null)
            tmp = new ArrayList<>();
        tmp.add(host);
        hosts.put(serviceName, tmp);
    }

    @DeleteMapping("/v1/registration/{serviceName}/{ipAddress}")
    public void deleteHost(@PathVariable("serviceName") String serviceName, @PathVariable("ipAddress") String ipAddress) {
        LOGGER.info("deleteHost: service={}, ip={}", serviceName, ipAddress);
        List<DiscoveryHost> tmp = hosts.get(serviceName);
        if (tmp != null) {
            Optional<DiscoveryHost> optHost = tmp.stream().filter(it -> it.getIpAddress().equals(ipAddress)).findFirst();
            if (optHost.isPresent())
                tmp.remove(optHost.get());
            hosts.put(serviceName, tmp);
        }
    }

}

Let’s get back to the Envoy configuration settings. Assuming we have built an image from Dockerfile visible below and then ran the container on default port we can invoke it under address http://192.168.99.100:9200. That address should be placed in envoy.json configuration file. Service discovery connection settings should be provided inside Cluster Manager section.

FROM openjdk:alpine
MAINTAINER Piotr Minkowski <piotr.minkowski@gmail.com>
ADD target/envoy-discovery.jar envoy-discovery.jar
ENTRYPOINT ["java", "-jar", "/envoy-discovery.jar"]
EXPOSE 9200

Here’s fragment from envoy.json file. Cluster for service discovery should be defined as a global SDS configuration, which must be specified inside sds property (1). The most important thing is to provide a correct URL (2) and on the basis of that Envoy automatically tries to call endpoint GET /v1/registration/{service_name}. The last interesting configuration field for that section is refresh_delay_ms, which is responsible for setting a delay between fetches a list of services registered in a discovery server. That’s not all. We also have to define cluster members. They are identified by the name (4). Their type is sds (5), what means that this cluster uses service discovery server for locating network addresses of calling microservice with the name defined in the service-name property.

"cluster_manager": {
    "clusters": [{
        "name": "service1", (4)
        "type": "sds", // (5)
	"connect_timeout_ms": 5000,
	"lb_type": "round_robin",
	"service_name": "person-service" // (6)
    }, {
        "name": "service2",
        "type": "sds",
        "connect_timeout_ms": 5000,
        "lb_type": "round_robin",
        "service_name": "product-service"
    }],
    "sds": { // (1)
	"cluster": {
		"name": "service_discovery",
		"type": "strict_dns",
		"connect_timeout_ms": 5000,
		"lb_type": "round_robin",
		"hosts": [{
			"url": "tcp://192.168.99.100:9200" // (2)
		}]
	},
	"refresh_delay_ms": 3000 // (3)
    }
}

Routing configuration is defined for every single listener inside route_config property (1). The first route is configured for person-service, which is processing by cluster service1 (2), second for product-service processing by service2 cluster. So, our services are available under http://192.168.99.100:10000/person and http://192.168.99.100:10000/product adresses.

{
    "name": "http_connection_manager",
    "config": {
        "codec_type": "auto",
        "stat_prefix": "ingress_http",
        "route_config": { // (1)
            "virtual_hosts": [{
		"name": "service",
		"domains": ["*"],
		"routes": [{
			"prefix": "/person", // (2)
			"cluster": "service1"
		}, {
			"prefix": "/product", // (3)
			"cluster": "service2"
		}]
            }]
        },
	"filters": [{
		"name": "router",
		"config": {}
        }]
    }
}

Building Microservices

The routing on Envoy proxy has been already configured. We still don’t have running microservices. Their implementation is based on Spring Boot framework and do nothing more than expose REST API providing simple operations on the object’s list and registering/unregistering service on discovery server. Here’s @Service bean responsible for that registration. The onApplicationEvent method is fired after application startup and destroy method just before gracefully shutdown.

@Service
public class PersonRegister implements ApplicationListener<ApplicationReadyEvent> {

    private static final Logger LOGGER = LoggerFactory.getLogger(PersonRegister.class);

    private String ip;
    @Value("${server.port}")
    private int port;
    @Value("${spring.application.name}")
    private String appName;
    @Value("${envoy.discovery.url}")
    private String discoveryUrl;

    @Autowired
    RestTemplate template;

	@Override
	public void onApplicationEvent(ApplicationReadyEvent event) {
		LOGGER.info("PersonRegistration.register");
		try {
			ip = InetAddress.getLocalHost().getHostAddress();
			DiscoveryHost host = new DiscoveryHost();
			host.setPort(port);
			host.setIpAddress(ip);
			template.postForObject(discoveryUrl + "/v1/registration/{service}", host, DiscoveryHosts.class, appName);
		} catch (Exception e) {
			LOGGER.error("Error during registration", e);
		}
	}

	@PreDestroy
	public void destroy() {
		try {
			template.delete(discoveryUrl + "/v1/registration/{service}/{ip}/", appName, ip);
			LOGGER.info("PersonRegister.unregistered: service={}, ip={}", appName, ip);
		} catch (Exception e) {
			LOGGER.error("Error during unregistration", e);
		}
	}

}

The best way to shutdown Spring Boot application gracefully is by its Actuator endpoint. To enable such endpoints for the service include spring-boot-starter-actuator to your project dependencies. Shutdown is disabled by default, so we should add the following properties to application.yml to enable it and additionally disable default security (endpoints.shutdown.sensitive=false). Now, just by calling POST /shutdown we can stop our Spring Boot application and test unregister method.

endpoints:
  shutdown:
    enabled: true
    sensitive: false

Same as before for microservices we also build docker images. Here’s person-service Dockerfile, which allows to override default service and SDS port.

FROM openjdk:alpine
MAINTAINER Piotr Minkowski <piotr.minkowski@gmail.com>
ADD target/person-service.jar person-service.jar
ENV DISCOVERY_URL http://192.168.99.100:9200
ENTRYPOINT ["java", "-jar", "/person-service.jar"]
EXPOSE 9300

To build image and run container of the service with custom listen port type the following docker commands.

docker build -t piomin/person-service .
docker run -d --name person-service -p 9301:9300 piomin/person-service

Distributed Tracing

It is time for the last piece of the puzzle – Zipkin tracing. Statistics related to all incoming requests should be sent there. The first part of configuration in Envoy proxy is inside tracing property which specifies global settings for the HTTP tracer.

"tracing": {
    "http": {
        "driver": {
            "type": "zipkin",
            "config": {
                "collector_cluster": "zipkin",
                "collector_endpoint": "/api/v1/spans"
            }
        }
    }
}

Network location and settings for Zipkin connection should be defined as a cluster member.

"clusters": [{
    "name": "zipkin",
    "connect_timeout_ms": 5000,
    "type": "strict_dns",
    "lb_type": "round_robin",
    "hosts": [
      {
        "url": "tcp://192.168.99.100:9411"
      }
    ]
}]

We should also add new section tracing in HTTP connection manager configuration (1). Field operation_name is required and sets a span name. Only ‘ingress’ and ‘egress’ values are supported.

"listeners": [{
	"filters": [{
        "name": "http_connection_manager",
        "config": {
			"tracing": { // (1)
				"operation_name": "ingress" // (2)
			}
			// ...
		}
	}]
}]

Zipkin server can be started using its Docker image.

docker run -d --name zipkin -p 9411:9411 openzipkin/zipkin

Summary

Here’s a list of running Docker containers for the test purpose. As you probably remember we have Zipkin, Envoy, custom discovery, two instances of person-service and one of product-service. You can add some person objects by calling POST /person and that display a list of all persons by calling GET /person. The requests should be load balanced between two instances basing on entries in the service discovery.

envoy-1

Information about every request is sent to Zipkin with a service name taken –service-cluster Envoy proxy running parameter.

envoy-2

Advertisements

Part 2: Creating microservices – monitoring with Spring Cloud Sleuth, ELK and Zipkin

One of the most frequently mentioned challenges related to the creation of microservices based architecture is monitoring. Each microservice should be run on an environment isolated from the other microservices, so it does not share resources such as databases or log files with them. However, the essential requirement for microservices architecture is relatively easy to access the call history, including the ability to look through the request propagation between multiple microservices. Grepping the logs is not the right solution for that problem. There are some helpful tools which can be used when creating microservices with Spring Boot and Spring Cloud frameworks.

Spring Cloud Sleuth – library available as a part of Spring Cloud project. Lets you track the progress of subsequent microservices by adding the appropriate headers to the HTTP requests. The library is based on the MDC (Mapped Diagnostic Context) concept, where you can easily extract values put to context and display them in the logs.

Zipkin – distributed tracing system that helps to gather timing data for every request propagated between independent services. It has simple management console where we can find visualization of the time statistics generated by subsequent services.

ELK – Elasticsearch, Logstash, Kibana: three different tools usually used together. They are used for searching, analyzing, and visualizing log data in a real time.

Probably many of you, even if you have not had a touch with Java or microservices before, heard about Logstash and Kibana. For example, if you look at the hub.docker.com among the most popular images you will find the ones for the above tools. In our example, we will just use those images. Let’s begin from running container with Elasticsearch.

docker run -d -it --name es -p 9200:9200 -p 9300:9300 elasticsearch

The we can run Kibana container and link it to the Elasticsearch.

docker run -d -it --name kibana --link es:elasticsearch -p 5601:5601 kibana

At the end we will start Logstash with input and output declared. As an input we declare TCP which is compatible with LogstashTcpSocketAppender used as a logging appender in our sample application. As an output elasticsearch has been declared. Each microservice will be indexed on its name with micro prefix.

docker run -d -it --name logstash -p 5000:5000 logstash -e 'input { tcp { port => 5000 codec => "json" } } output { elasticsearch { hosts => ["192.168.99.100"] index => "micro-%{serviceName}"} }'

Now we can take a look on sample microservices. This post is a continuation of my previous article Part 1: Creating microservice using Spring Cloud, Eureka and Zuul. Architecture and exposed services are the same as in the previous sample. Source code is available on GitHub (branch logstash). Like a mentioned before we will use Logback library for sending log data to Logstash. In addition to the three Logback dependencies we also add libraries for Zipkin integration and Spring Cloud Sleuth starter. Here’s fragment of pom.xml for microservice.

		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-sleuth</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-sleuth-zipkin</artifactId>
		</dependency>
		<dependency>
			<groupId>net.logstash.logback</groupId>
			<artifactId>logstash-logback-encoder</artifactId>
			<version>4.9</version>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-classic</artifactId>
			<version>1.2.3</version>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-core</artifactId>
			<version>1.2.3</version>
		</dependency>

There is also Logback configuration file in src/main/resources directory. Here’s logback.xml fragment. We can configure which logging field are sending to Logstash by declaring tags mdc, logLevel, message etc. We are also appending service name field for elasticsearch index creation.

	<appender name="STASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
		<destination>192.168.99.100:5000</destination>

		<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
			<providers>
				<mdc />
				<context />
				<logLevel />
				<loggerName />

				<pattern>
					<pattern>
						{
						"serviceName": "account-service"
						}
					</pattern>
				</pattern>

				<threadName />
				<message />
				<logstashMarkers />
				<stackTrace />
			</providers>
		</encoder>
	</appender>

The configuration of Spring Cloud Sleuth is very simple. We only have to add spring-cloud-starter-sleuth dependency to pom.xml and declare sampler @Bean . In the sample I declared AlwaysSampler that exports every span, but there is also an other other option – PercentageBasedSampler that samples a fixed fraction of spans.

	@Bean
	public AlwaysSampler defaultSampler() {
	  return new AlwaysSampler();
	}

After starting ELK docker containers we need to run our microservices. There are 5 Spring Boot applications which need to be run discovery-service, account-service, customer-service, gateway-service and zipkin-service. After launching all of them we can try call some services, for example http://localhost:8765/api/customer/customers/{id}, which causes calling of both customer and account service. All logs will be stored in elasticsearch with micro-%{serviceName} index. They can be searched in Kibana with micro-* index pattern. Index patterns are created in Kibana under section Management -> Index patterns. Kibana is available under address http://192.168.99.100:5601. After first running we will be prompt for index pattern, so let’s type micro-*. Under Discover section we can take o look on all logs matched typed pattern with timeline visualization.

kibana2

Kibana is rather intuitive and user friendly tool. I will not describe in the details how to use Kibana, because you can easily find it out by yourself reading a documentation or just clicking UI. The most important thing is to be able to search a logs by filtering criteria. In the picture below there is an example of searching logs by X-B3-TraceId field, which is add to the request header by Spring Cloud Sleuth. Sleuth also adds X-B3-SpanId for marking request for single microservice. We can select which field are displayed in the result list – in this sample I selected message and serviceName like you see in the left pane of the picture.

kibana1

Here’s a picture with single request details. It is visible after expanding each log row.

kibana3

Spring Cloud Sleuth also sends statistics to Zipkin. That is another kind of data than is stored in Logstash. These are timing statistics for each request. Zipkin UI is really simple. You can filter the requests by some criteria like time, service name, endpoint name. Here’s picture with same requests which were visualized with Kibana: http://localhost:8765/api/customer/customers/{id}.

zipkin-1

We can always see the details of each request by clicking on it. Then you see the picture similar to visible below. In the beginning, the request has been processed on API gateway. Then gateway discovered customer service on Eureka server and called that service. Customer service also has to discover account service and then call it. In this view you can easily find out which operation is the most time consuming.

zipkin-3