Generating large PDF files using JasperReports

During the last ‘Code Europe’ conference in Warsaw appeared many topics related to microservices architecture. Several times I heard the conclusion that the best candidate for separation from monolith is service that generates PDF reports. It’s usually quite independent from the other parts of application. I can see a similar approach in my organization, where first microservice running in production mode was the one that generates PDF reports. To my surprise, the vendor which developed that microservice had to increase maximum heap size to 1GB on each of its instances. This has forced me to take a closer look at the topic of PDF reports generation process.
The most popular Java library for creating PDF files is JasperReports. During generation process, this library by default stores all objects in RAM memory. If such reports are large, this could be a problem my vendor encountered. Their solution, as I have mentioned before, was to increase the maximum size of Java heap ūüôā

This time, unlike usual, I’m going to start with the test implementation. Here’s simple JUnit test with 20 requests per second sending to service endpoint.

public class JasperApplicationTest {

	protected Logger logger = Logger.getLogger(JasperApplicationTest.class.getName());
	TestRestTemplate template = new TestRestTemplate();

	@Test
	public void testGetReport() throws InterruptedException {
		List<HttpStatus> responses = new ArrayList<>();
		Random r = new Random();
		int i = 0;
		for (; i < 20; i++) {
			new Thread(new Runnable() {
				@Override
				public void run() {
					int age = r.nextInt(99);
					long start = System.currentTimeMillis();
					ResponseEntity<InputStreamResource> res = template.getForEntity("http://localhost:2222/pdf/{age}", InputStreamResource.class, age);
					logger.info("Response (" +  (System.currentTimeMillis()-start) + "): " + res.getStatusCode());
					responses.add(res.getStatusCode());
					try {
						Thread.sleep(50);
					} catch (InterruptedException e) {
						e.printStackTrace();
					}
				}
			}).start();
		}

		while (responses.size() != i) {
			Thread.sleep(500);
		}
		logger.info("Test finished");
	}
}

In my test scenario I inserted about 1M records into the person table. Everything works fine during running test. Generated files had about 500kb size and 200 pages. All requests were succeeded and each of them had been processed about 8 seconds. In comparison with single request which had been processed 4 seconds it seems to be a good result. The situation with RAM is worse as you can see in the figure below. After generating 20 PDF reports allocated heap size increases to more than 1GB and used heap size was about 550MB. Also CPU usage during report generation increased to 100% usage. I could easily image generating files bigger than 500kb in the production mode…

jasper-1

In our situation we have two options. We can always add more RAM memory or … look for another choice ūüôā Jasper library comes with solution – Virtualizers. The virtualizer cuts the jasper report print into different files and save them on the hard drive and/or compress it. There are three types of virtualizers:
JRFileVirtualizer, JRSwapFileVirtualizer and JRGzipVirtualizer. You can read more about them here. Now, look at the figure below. Here’s illustration of memory and CPU usage for the test with JRFileVirtualizer. It looks a little better than the previous figure, but it does not knock us down ūüôā However, requests with the same overload as for the previous test take much longer – about 30 seconds. It’s not a good message, but at least the heap size allocation is not increases as fast as for previous sample.

jasper-2

Same test has been performed for JRSwapFileVirtualizer. The requests was average processed around 10 seconds. The graph illustrating CPU and memory usage is rather more similar to in memory test than JRFileVirtualizer test.

jasper-3

To see the difference between those three scenarios we have to run our application with maximum heap size set. For my tests I set -Xmx128m -Xms128m. For test with file virtualizers we receive HTTP responses with PDF reports, but for in memory tests the exception is thrown by the sample application: java.lang.OutOfMemoryError: GC overhead limit exceeded.

For testing purposes I created Spring Boot application. Sample source code is available as usual on GitHub. Here’s full list of Maven dependencies for that project.

<dependency>
	<groupId>net.sf.jasperreports</groupId>
	<artifactId>jasperreports</artifactId>
	<version>6.4.0</version>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-test</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<scope>runtime</scope>
</dependency>

Here’s application main class. There are @Bean declarations of file virtualizers and JasperReport which is responsible for template compilation from .jrxml file. To run application for testing purposes type java -jar -Xms64m -Xmx128m -Ddirectory=C:\Users\minkowp\pdf sample-jasperreport-boot.jar.

@SpringBootApplication
public class JasperApplication {

	@Value("${directory}")
	private String directory;

	public static void main(String[] args) {
		SpringApplication.run(JasperApplication.class, args);
	}

	@Bean
	JasperReport report() throws JRException {
		JasperReport jr = null;
		File f = new File("personReport.jasper");
		if (f.exists()) {
			jr = (JasperReport) JRLoader.loadObject(f);
		} else {
			jr = JasperCompileManager.compileReport("src/main/resources/report.jrxml");
			JRSaver.saveObject(jr, "personReport.jasper");
		}
		return jr;
	}

	@Bean
	JRFileVirtualizer fileVirtualizer() {
		return new JRFileVirtualizer(100, directory);
	}

	@Bean
	JRSwapFileVirtualizer swapFileVirtualizer() {
		JRSwapFile sf = new JRSwapFile(directory, 1024, 100);
		return new JRSwapFileVirtualizer(20, sf, true);
	}

}

There are three endpoints exposed for the tests:
/pdf/{age} – in memory PDF generation
/pdf/fv/{age} – PDF generation with JRFileVirtualizer
/pdf/sfv/{age} – PDF generation with JRSwapFileVirtualizer

Here’s method generating PDF report. Report is generated in fillReport static method from JasperFillManager. It takes three parameters as input: JasperReport which encapsulates compiled .jrxml template file, JDBC connection object and map of parameters. Then report is ganerated and saved on disk as a PDF file. File is returned as an attachement in the response.

	private ResponseEntity<InputStreamResource> generateReport(String name, Map<String, Object> params) {
		FileInputStream st = null;
		Connection cc = null;
		try {
			cc = datasource.getConnection();
			JasperPrint p = JasperFillManager.fillReport(jasperReport, params, cc);
			JRPdfExporter exporter = new JRPdfExporter();
			SimpleOutputStreamExporterOutput c = new SimpleOutputStreamExporterOutput(name);
			exporter.setExporterInput(new SimpleExporterInput(p));
			exporter.setExporterOutput(c);
			exporter.exportReport();

			st = new FileInputStream(name);
			HttpHeaders responseHeaders = new HttpHeaders();
			responseHeaders.setContentType(MediaType.valueOf("application/pdf"));
			responseHeaders.setContentDispositionFormData("attachment", name);
			responseHeaders.setContentLength(st.available());
		    return new ResponseEntity<InputStreamResource>(new InputStreamResource(st), responseHeaders, HttpStatus.OK);
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			fv.cleanup();
			sfv.cleanup();
			if (cc != null)
				try {
					cc.close();
				} catch (SQLException e) {
					e.printStackTrace();
				}
		}
		return null;
	}

To enable virtualizer during report generation we only have to pass one parameter to the map of parameters – instance of virtualizer object.

	@Autowired
	JRFileVirtualizer fv;
	@Autowired
	JRSwapFileVirtualizer sfv;
	@Autowired
	DataSource datasource;
	@Autowired
	JasperReport jasperReport;

	@ResponseBody
	@RequestMapping(value = "/pdf/fv/{age}")
	public ResponseEntity<InputStreamResource> getReportFv(@PathVariable("age") int age) {
		logger.info("getReportFv(" + age + ")");
		Map<String, Object> m = new HashMap<>();
		m.put(JRParameter.REPORT_VIRTUALIZER, fv);
		m.put("age", age);
		String name = ++count + "personReport.pdf";
		return generateReport(name, m);
	}

Template file report.jrxml is available under /src/main/resources directory. Inside queryString tag there is SQL query which takes age parameter in WHERE statement. There are also five columns declared all taken from SQL query result.

<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE jasperReport PUBLIC "//JasperReports//DTD Report Design//EN"    "http://jasperreports.sourceforge.net/dtds/jasperreport.dtd">

<jasperReport xmlns="http://jasperreports.sourceforge.net/jasperreports"               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"               xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports    http://jasperreports.sourceforge.net/xsd/jasperreport.xsd"               name="report2" pageWidth="595" pageHeight="842"                columnWidth="555" leftMargin="20" rightMargin="20"               topMargin="20" bottomMargin="20">
    <parameter name="age" class="java.lang.Integer"/>
    <queryString>
        <![CDATA[SELECT * FROM person WHERE age = $P{age}]]>
    </queryString>
    <field name="id" class="java.lang.Integer" />
    <field name="first_name" class="java.lang.String" />
    <field name="last_name" class="java.lang.String" />
    <field name="age" class="java.lang.Integer" />
    <field name="pesel" class="java.lang.String" />

    <detail>
        <band height="15">

            <textField>
                <reportElement x="0" y="0" width="50" height="15" />

                <textElement textAlignment="Right" verticalAlignment="Middle"/>

                <textFieldExpression class="java.lang.Integer">
                    <![CDATA[$F{id}]]>
                </textFieldExpression>
            </textField>       

            <textField>
                <reportElement x="100" y="0" width="80" height="15" />

                <textElement textAlignment="Left" verticalAlignment="Middle"/>

                <textFieldExpression class="java.lang.String">
                    <![CDATA[$F{first_name}]]>
                </textFieldExpression>
            </textField> 

            <textField>
                <reportElement x="200" y="0" width="80" height="15" />

                <textElement textAlignment="Left" verticalAlignment="Middle"/>

                <textFieldExpression class="java.lang.String">
                    <![CDATA[$F{last_name}]]>
                </textFieldExpression>
            </textField>               

            <textField>
                <reportElement x="300" y="0" width="50" height="15"/>
                <textElement textAlignment="Right" verticalAlignment="Middle"/>

                <textFieldExpression class="java.lang.Integer">
                    <![CDATA[$F{age}]]>
                </textFieldExpression>
            </textField>

           <textField>
                <reportElement x="380" y="0" width="80" height="15" />

                <textElement textAlignment="Left" verticalAlignment="Middle"/>

                <textFieldExpression class="java.lang.String">
                    <![CDATA[$F{pesel}]]>
                </textFieldExpression>
            </textField>         

        </band>
    </detail>

</jasperReport>

And the last thing we have to do is to properly set database connection pool settings. A natural choice for Spring Boot application is Tomcat JDBC pool.

spring:
  application:
    name: jasper-service
  datasource:
    url: jdbc:mysql://192.168.99.100:33306/datagrid?useSSL=false
    username: datagrid
    password: datagrid
    tomcat:
      initial-size: 20
      max-active: 30

Final words

In this article I showed you how to avoid out of memory exception while generating large PDF reports with JasperReports. I compared three solutions: in memory generation and two methods based on cutting the jasper print into different files and save them on the hard drive. For me, the most interesting was the solution based on single swapped file with JRSwapFileVirtualizer. It is slower a little than in memory generation but works faster than similar tests for JRFileVirtualizer and in contrast to in memory generation didn’t avoid out of memory exception for files larger than 500kb with 20 requests per second.

Exposing Microservices over REST Protocol Buffers

Today exposing RESTful API with JSON protocol is the most common standard. We can find many articles describing advantages and disadvantages of JSON versus XML. Both these protocols exchange messages in text format. If an important aspect affecting to the choice of communication protocol in your systems is performance you should definitely pay attention to Protocol Buffers. It is a binary format created by Google as:

A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

Protocol Buffers, which is sometimes referred as Protobuf is not only a message format but also a set of language rules that define the structure of messages. It is extremely useful in service to service communication what has been very well described in that article Beating JSON performance with Protobuf. In that example Protobuf was about 5 times faster than JSON for tests based on Spring Boot framework.

Introduction to Protocol Buffers can be found here. My sample is similar to previous samples from my weblog – it is based on two microservices account and customer which calls one of account’s endpoint. Let’s begin from message types definition provided inside .proto file. Place your .proto file in src/main/proto¬†directory. Here’s account.proto defined in account service. We set java_package and java_outer_classname to define package and name of Java generated class. Message definition syntax is pretty intuitive. Account object generated from that file¬†has three properties id, customerId and number. There is also Accounts object which wrappes list of Account objects.

syntax = "proto3";

package model;

option java_package = "pl.piomin.services.protobuf.account.model";
option java_outer_classname = "AccountProto";

message Accounts {
	repeated Account account = 1;
}

message Account {

	int32 id = 1;
	string number = 2;
	int32 customer_id = 3;

}

Here’s .proto file definition from customer service. It a little more complicated than the previous one from account service. In addition to its definitions it contains definitions of account service messages, because they are used by @Feign client.

syntax = "proto3";

package model;

option java_package = "pl.piomin.services.protobuf.customer.model";
option java_outer_classname = "CustomerProto";

message Accounts {
	repeated Account account = 1;
}

message Account {

	int32 id = 1;
	string number = 2;
	int32 customer_id = 3;

}

message Customers {
	repeated Customer customers = 1;
}

message Customer {

	int32 id = 1;
	string pesel = 2;
	string name = 3;
	CustomerType type = 4;
	repeated Account accounts = 5;

	enum CustomerType {
		INDIVIDUAL = 0;
		COMPANY = 1;
	}

}

We generate source code from the message definitions above by using protobuf-maven-plugin maven plugin. Plugin needs to have protocExecutable file location set. It can be downloaded from Google’s Protocol Buffer download site.

<plugin>
	<groupId>org.xolstice.maven.plugins</groupId>
	<artifactId>protobuf-maven-plugin</artifactId>
	<version>0.5.0</version>
	<executions>
		<execution>
			<id>protobuf-compile</id>
			<phase>generate-sources</phase>
			<goals>
				<goal>compile</goal>
			</goals>
			<configuration>
				<outputDirectory>src/main/generated</outputDirectory>
				<protocExecutable>${proto.executable}</protocExecutable>
			</configuration>
		</execution>
	</executions>
</plugin>

Protobuf classes are generated into src/main/generated output directory. Let’s add that source directory to maven sources with build-helper-maven-plugin.

<plugin>
	<groupId>org.codehaus.mojo</groupId>
	<artifactId>build-helper-maven-plugin</artifactId>
	<executions>
		<execution>
			<id>add-source</id>
			<phase>generate-sources</phase>
			<goals>
				<goal>add-source</goal>
			</goals>
			<configuration>
				<sources>
					<source>src/main/generated</source>
				</sources>
			</configuration>
		</execution>
	</executions>
</plugin>

Sample application source code is available on GitHub. Before proceeding to the next steps build application using mvn clean install command. Generated classes are available under src/main/generated and our microservices are ready to run. Now, let me describe some implementation details. We need two dependencies in maven pom.xml to use Protobuf.

<dependency>
	<groupId>com.google.protobuf</groupId>
	<artifactId>protobuf-java</artifactId>
	<version>3.3.1</version>
</dependency>
<dependency>
	<groupId>com.googlecode.protobuf-java-format</groupId>
	<artifactId>protobuf-java-format</artifactId>
	<version>1.4</version>
</dependency>

Then, we need to declare default HttpMessageConverter @Bean and inject it into RestTemplate @Bean.

    @Bean
    @Primary
    ProtobufHttpMessageConverter protobufHttpMessageConverter() {
        return new ProtobufHttpMessageConverter();
    }

    @Bean
    RestTemplate restTemplate(ProtobufHttpMessageConverter hmc) {
        return new RestTemplate(Arrays.asList(hmc));
    }

Here’s REST @Controller code. Account and Accounts from AccountProto generated class are returned as a response body in all three API methods visible below. All objects generated from .proto files have newBuilder method used for creating new object instances. I also set application/x-protobuf as default response content type.

@RestController
public class AccountController {

	@Autowired
	AccountRepository repository;

	protected Logger logger = Logger.getLogger(AccountController.class.getName());

	@RequestMapping(value = "/accounts/{number}", produces = "application/x-protobuf")
	public Account findByNumber(@PathVariable("number") String number) {
		logger.info(String.format("Account.findByNumber(%s)", number));
		return repository.findByNumber(number);
	}

	@RequestMapping(value = "/accounts/customer/{customer}", produces = "application/x-protobuf")
	public Accounts findByCustomer(@PathVariable("customer") Integer customerId) {
		logger.info(String.format("Account.findByCustomer(%s)", customerId));
		return Accounts.newBuilder().addAllAccount(repository.findByCustomer(customerId)).build();
	}

	@RequestMapping(value = "/accounts", produces = "application/x-protobuf")
	public Accounts findAll() {
		logger.info("Account.findAll()");
		return Accounts.newBuilder().addAllAccount(repository.findAll()).build();
	}

}

Method GET /accounts/customer/{customer} is called from customer service using @Feign client.

@FeignClient(value = "account-service")
public interface AccountClient {

    @RequestMapping(method = RequestMethod.GET, value = "/accounts/customer/{customerId}")
    Accounts getAccounts(@PathVariable("customerId") Integer customerId);

}

We can easily test described configuration using JUnit test class visible below.

@SpringBootTest(webEnvironment = WebEnvironment.RANDOM_PORT)
@RunWith(SpringRunner.class)
public class AccountApplicationTest {

	protected Logger logger = Logger.getLogger(AccountApplicationTest.class.getName());

	@Autowired
	TestRestTemplate template;

	@Test
	public void testFindByNumber() {
		Account a = this.template.getForObject("/accounts/{id}", Account.class, "111111");
		logger.info("Account[\n" + a + "]");
	}

	@Test
	public void testFindByCustomer() {
		Accounts a = this.template.getForObject("/accounts/customer/{customer}", Accounts.class, "2");
		logger.info("Accounts[\n" + a + "]");
	}

	@Test
	public void testFindAll() {
		Accounts a = this.template.getForObject("/accounts", Accounts.class);
		logger.info("Accounts[\n" + a + "]");
	}

	@TestConfiguration
	static class Config {

		@Bean
		public RestTemplateBuilder restTemplateBuilder() {
			return new RestTemplateBuilder().additionalMessageConverters(new ProtobufHttpMessageConverter());
		}

	}

}

Conclusion

This article shows how to enable Protocol Buffers for microservices project based on Spring Boot. Protocol Buffer is an alternative to text-based protocols like XML or JSON and surpasses them in terms of performance. Adapt to this protocol using in Spring Boot application is pretty simple. For microservices we can still uses Spring Cloud components like Feign or Ribbon in combination with Protocol Buffers same as with REST over JSON or XML.

Spring Cloud Microservices at Pivotal Platform

Imagine you have multiple microservices running on different machines as multiple instances. It seems natural to think about the tools that helps you¬†in the process of monitoring and managing all of them.¬†If we add that our microservices are¬†created based on the Spring Cloud framework obviously seems we should look at the Pivotal platform. Here is figure with platform’s architecture download from the main Pivotal’s site.

PVDI-Microservices-Architecture

Although Pivotal Platform can run applications written in many languages it has the best support for Spring Cloud Services and Netflix OSS tools like you can see in the figure above. From the possibilities offered by Pivotal we can take advantage of three ways.

Pivotal Cloud Foundry – solution can be ran on public IaaS or private cloud like AWS, Google Cloud Platform, Microsoft Azure, VMware vSphere, OpenStack.

Pivotal Web Services – hosted cloud-native platform available at pivotal.io site.

PCF Dev Рthe instance which can be run locally as a a single virtual machine. It offers the opportunity to develop apps using an offline environment which basic services installed like Spring Cloud Services (SCS), MySQL, Redis databases and RabbitMQ broker. If you want to run it locally with SCS you need more than 6GB RAM free.

As a Spring Cloud Services there are available Circuit Breaker (Hystrix), Service Registry (Eureka) and standard Spring Configuration Server based on git configuration.

scs

That’s all I wanted to say about the theory. Let’s move on to practice. On the Pivotal website we have detailed materials on how to set it up, create and deploy a simple microservice based on Spring Cloud solutions. In this article I will try to present the essence collected from these descriptions based on one of my standard examples from the previous posts. As always sample source code is available on GitHub. If you are interested in detailed description of the sample application, microservices and Spring Cloud read my previous articles:

Part 1: Creating microservice using Spring Cloud, Eureka and Zuul

Part 3: Creating Microservices: Circuit Breaker, Fallback and Load Balancing with Spring Cloud

If you have a lot of free RAM you can install PCF Dev on your local workstation. You need to have Virtual Box installed. Then download and install Cloud Foundry Command Line Interface (CF CLI) and PCF Dev. All is described here. Finally you can run command below and take a small break for coffee. Virtual machine needs to downloaded and started.

cf dev start -s scs

For those who do not have RAM enough (like me) there is Pivotal Web Services platform. It is available here. Before use it you have to register on Pivotal site. The rest of the article is identical for both options.
In comparison to previous examples of Spring Cloud based microservices, we need to make some changes. There is one additional dependency inside every microservice’s pom.xml.

<properties>
	...
	<spring-cloud-services.version>1.4.1.RELEASE</spring-cloud-services.version>
	<spring-cloud.version>Dalston.RELEASE</spring-cloud.version>
</properties>

<dependencies>
	<dependency>
		<groupId>io.pivotal.spring.cloud</groupId>
		<artifactId>spring-cloud-services-starter-service-registry</artifactId>
	</dependency>
	...
</dependencies>

<dependencyManagement>
	<dependencies>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-dependencies</artifactId>
			<version>${spring-cloud.version}</version>
			<type>pom</type>
			<scope>import</scope>
		</dependency>
		<dependency>
			<groupId>io.pivotal.spring.cloud</groupId>
			<artifactId>spring-cloud-services-dependencies</artifactId>
			<version>${spring-cloud-services.version}</version>
			<type>pom</type>
			<scope>import</scope>
		</dependency>
	</dependencies>
</dependencyManagement>

We also use Maven Cloud Foundry plugin cf-maven-plugin for application deployment on Pivotal platform. Here is sample for account-service. We run two instances of that microservice with max memory 512MB. Our application name is piomin-account-service.

<plugin>
	<groupId>org.cloudfoundry</groupId>
	<artifactId>cf-maven-plugin</artifactId>
	<version>1.1.3</version>
	<configuration>
		<target>http://api.run.pivotal.io</target>
		<org>piotrminkowski</org>
		<space>development</space>
		<appname>piomin-account-service</appname>
		<memory>512</memory>
		<instances>2</instances>
		<server>cloud-foundry-credentials</server>
	</configuration>
</plugin>

Don’t forget to add credentials configuration into Maven settings.xml file.

<server>
	<id>cloud-foundry-credentials</id>
	<username>piotr.minkowski@gmail.com</username>
	<password>***</password>
</server>

Now, when building sample application we to append cf:push command.

mvn clean install cf:push

Here is circuit breaker implementation inside customer-service.

@Service
public class AccountService {

	@Autowired
	private AccountClient client;

	@HystrixCommand(fallbackMethod = "getEmptyList")
	public List<Account> getAccounts(Integer customerId) {
		return client.getAccounts(customerId);
	}

	List<Account> getEmptyList(Integer customerId) {
		return new ArrayList<>();
	}

}

There is randomly generated delay on the account’s service side, so 25% of calls circuit breaker should be activated.

@RequestMapping("/accounts/customer/{customer}")
public List<Account> findByCustomer(@PathVariable("customer") Integer customerId) {
	logger.info(String.format("Account.findByCustomer(%s)", customerId));
	Random r = new Random();
	int rr = r.nextInt(4);
	if (rr == 1) {
		try {
			Thread.sleep(2000);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}
	return accounts.stream().filter(it -> it.getCustomerId().intValue() == customerId.intValue())
		.collect(Collectors.toList());
}

After successfully deploying application using Maven cf:push command we can go to Pivotal Web Services console available at https://console.run.pivotal.io/. Here are our two deployed services: two instances of piomin-account-service and one instance of piomin-customer-service.

pivotal-1

I have also activated Circuit Breaker and Service Registry from Marketplace.

pivotal-2

Every application need to be bound to service. To enable it select service, then expand Bound Apps overlap and select checkbox next to each service name.

pivotal-4

After this step applications needs to be restarted. It also can be be using web dashboard inside each service.

pivotal-5

Finally, all services are registered in Eureka and we can perform some tests using customer endpoint https://piomin-customer-service.cfapps.io/customers/{id}.

pivotal-4

Final words

With Pivotal solution we can easily deploy, scale and monitor our microservices. Deployment and scaling can be done using Maven plugin or via web dashboard. On Pivotal there are also available some services prepared especially for microservices needs like service registry, circuit breaker and configuration server. Pivotal is a competition for such solutions like Kubernetes which based on Docker containerization (more about this tools here). It is especially useful if you are creating a microservices based on Spring Boot and Spring Cloud frameworks.

Part 3: Creating Microservices: Circuit Breaker, Fallback and Load Balancing with Spring Cloud

Probably you read some articles about Hystrix and you know in what purpose it is used for. Today I would like to show you an example of exactly how to use it, which gives you the ability to combine with other tools from Netflix OSS stack like Feign and Ribbon. In this I assume that you have basic knowledge on topics such as microservices, load balancing, service discovery. If not I suggest you read some articles about it, for example my short introduction to microservices architecture available here: Part 1: Creating microservice using Spring Cloud, Eureka and Zuul. The code sample used in that article is also also used now. There is also sample source code available on GitHub. For the sample described now see hystrix branch, for basic sample master branch. 

Let’s look at some scenarios for using fallback and circuit breaker. We have Customer Service which calls API method from Account Service. There two running instances of Account Service. The requests to Account Service instances are load balanced by Ribbon client 50/50.

micro-details-1

Scenario 1

Hystrix is disabled for Feign client (1), auto retries mechanism is disabled for Ribbon client on local instance (2) and other instances (3). Ribbon read timeout is shorter than request max process time (4). This scenario also occurs with the default Spring Cloud configuration without Hystrix. When you call customer test method you sometimes receive full response and sometimes 500 HTTP error code (50/50).

ribbon:
  eureka:
    enabled: true
  MaxAutoRetries: 0 #(2)
  MaxAutoRetriesNextServer: 0 #(3)
  ReadTimeout: 1000 #(4)

feign:
  hystrix:
    enabled: false #(1)

Scenario 2

Hystrix is still disabled for Feign client (1), auto retries mechanism is disabled for Ribbon client on local instance (2) but enabled on other instances once (3). You always receive full response. If your request is received by instance with delayed response it is timed out after 1 second and then Ribbon calls another instance – in that case not delayed. You can always change MaxAutoRetries to positive value but gives us nothing in that sample.

ribbon:
  eureka:
    enabled: true
  MaxAutoRetries: 0 #(2)
  MaxAutoRetriesNextServer: 1 #(3)
  ReadTimeout: 1000 #(4)

feign:
  hystrix:
    enabled: false #(1)

Scenario 3

Here is not a very elegant solution to the problem. We set ReadTimeout on value bigger than delay inside API method (5000 ms).

ribbon:
  eureka:
    enabled: true
  MaxAutoRetries: 0
  MaxAutoRetriesNextServer: 0
  ReadTimeout: 10000

feign:
  hystrix:
    enabled: false

Generally configuration from Scenario 2 and 3 is right, you always get the full response. But in some cases you will wait more than 1 second (Scenario 2) or more than 5 seconds (Scenario 3) and delayed instance receives 50% requests from Ribbon client. But fortunately there is Hystrix – circuit breaker.

Scenario 4

Let’s enable Hystrix just by removing feign property. There is no auto retries for Ribbon client (1) and its read timeout (2) is bigger than Hystrix’s timeout (3). 1000ms is also default value for Hystrix timeoutInMilliseconds property. Hystrix circuit breaker and fallback will work for delayed instance of account service. For some first requests you receive fallback response from Hystrix. Then delayed instance will be cut off from requests, most of them will be directed to not delayed instance.

ribbon:
  eureka:
    enabled: true
  MaxAutoRetries: 0 #(1)
  MaxAutoRetriesNextServer: 0
  ReadTimeout: 2000 #(2)

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 1000 #(3)

Scenario 5

This scenario is a more advanced development of Scenario 4. Now Ribbon timeout (2) is lower than Hystrix timeout (3) and also auto retries mechanism is enabled (1) for local instance and for other instances (4). The result is same as for Scenario 2 and 3 – you receive full response, but Hystrix is enabled and it cuts off delayed instance from future requests.

ribbon:
  eureka:
    enabled: true
  MaxAutoRetries: 3 #(1)
  MaxAutoRetriesNextServer: 1 #(4)
  ReadTimeout: 1000 #(2)

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 10000 #(3)

I could imagine a few other scenarios. But the idea was just a show differences in circuit breaker and fallback when modifying configuration properties for Feign, Ribbon and Hystrix in application.yml.

Hystrix

Let’s take a closer look on standard Hystrix circuit breaker and ¬†usage described in Scenario 4. To enable Hystrix in your Spring Boot application you have to following dependencies to pom.xml. Second step is to add annotation @EnableCircuitBreaker to main application class and also @EnableHystrixDashboard if you would like to have UI dashboard available.

<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>
<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-hystrix-dashboard</artifactId>
</dependency>

Hystrix fallback is set on Feign client inside customer service.

@FeignClient(value = "account-service", fallback = AccountFallback.class)
public interface AccountClient {

    @RequestMapping(method = RequestMethod.GET, value = "/accounts/customer/{customerId}")
    List<Account> getAccounts(@PathVariable("customerId") Integer customerId);

}

Fallback implementation is really simple. In this case I just return empty list instead of customer’s account list received from account service.

@Component
public class AccountFallback implements AccountClient {

	@Override
	public List<Account> getAccounts(Integer customerId) {
		List<Account> acc = new ArrayList<Account>();
		return acc;
	}

}

Now, we can perform some tests. Let’s start discovery service, two instances of account service on different ports (-DPORT VM argument during startup) and customer service. Endpoint for tests is /customers/{id}. There is also JUnit test class which sends multiple requests to this enpoint available in customer-service module¬†pl.piomin.microservices.customer.ApiTest.

	@RequestMapping("/customers/{id}")
	public Customer findById(@PathVariable("id") Integer id) {
		logger.info(String.format("Customer.findById(%s)", id));
		Customer customer = customers.stream().filter(it -> it.getId().intValue()==id.intValue()).findFirst().get();
		List<Account> accounts =  accountClient.getAccounts(id);
		customer.setAccounts(accounts);
		return customer;
	}

I enabled Hystrix Dashboard on account-service main class. If you would like to access it call from your web browser http://localhost:2222/hystrix address and then type Hystrix’s stream address from customer-service http://localhost:3333/hystrix.stream. When I run test that sends 1000 requests to customer service¬†about 20 (2%) of them were forwarder to delayed instance of account service, remaining to not delayed instance. Hystrix dashboard during that test is visible below. For more advanced Hystrix configuration refer to its documentation available¬†here.

hystrix-1

Part 2: Creating microservices – monitoring with Spring Cloud Sleuth, ELK and Zipkin

One of the most frequently mentioned challenges related to the creation of microservices based architecture is monitoring. Each microservice should be run on an environment isolated from the other microservices, so it does not share resources such as databases or log files with them. However, the essential requirement for microservices architecture is relatively easy to access the call history, including the ability to look through the request propagation between multiple microservices. Grepping the logs is not the right solution for that problem. There are some helpful tools which can be used when creating microservices with Spring Boot and Spring Cloud frameworks.

Spring Cloud Sleuth – library available as a part of Spring Cloud project. Lets you track the progress of subsequent microservices by adding the appropriate headers to the HTTP requests. The library is based on the MDC (Mapped Diagnostic Context) concept, where you can easily extract values put to context and display them in the logs.

Zipkin – distributed tracing system that helps to gather timing data for every request propagated between independent services. It has simple management console where we can find visualization of the time statistics generated by subsequent services.

ELK РElasticsearch, Logstash, Kibana: three different tools usually used together. They are used for searching, analyzing, and visualizing log data in a real time.

Probably many of you, even if you have not had a touch with Java or microservices before, heard about Logstash and Kibana. For example, if you look at the hub.docker.com among the most popular images you will find the ones for the above tools. In our example, we will just use those images. Let’s begin from running container with Elasticsearch.

docker run -d -it --name es -p 9200:9200 -p 9300:9300 elasticsearch

The we can run Kibana container and link it to the Elasticsearch.

docker run -d -it --name kibana --link es:elasticsearch -p 5601:5601 kibana

At the end we will start Logstash with input and output declared. As an input we declare TCP which is compatible with LogstashTcpSocketAppender used as a logging appender in our sample application. As an output elasticsearch has been declared. Each microservice will be indexed on its name with micro prefix.

docker run -d -it --name logstash -p 5000:5000 logstash -e 'input { tcp { port => 5000 codec => "json" } } output { elasticsearch { hosts => ["192.168.99.100"] index => "micro-%{serviceName}"} }'

Now we can take a look on sample microservices. This post is a continuation of my previous article Part 1: Creating microservice using Spring Cloud, Eureka and Zuul. Architecture and exposed services are the same as in the previous sample. Source code is available on GitHub (branch logstash). Like a mentioned before we will use Logback library for sending log data to Logstash.¬†In addition to the three Logback dependencies we also add libraries for Zipkin integration and Spring Cloud Sleuth starter. Here’s fragment of pom.xml for microservice.

		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-sleuth</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-sleuth-zipkin</artifactId>
		</dependency>
		<dependency>
			<groupId>net.logstash.logback</groupId>
			<artifactId>logstash-logback-encoder</artifactId>
			<version>4.9</version>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-classic</artifactId>
			<version>1.2.3</version>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-core</artifactId>
			<version>1.2.3</version>
		</dependency>

There is also Logback configuration file in src/main/resources directory. Here’s logback.xml fragment. We can configure which logging field are sending to Logstash by declaring tags mdc, logLevel, message etc. We are also appending service name field for elasticsearch index creation.

	<appender name="STASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
		<destination>192.168.99.100:5000</destination>

		<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
			<providers>
				<mdc />
				<context />
				<logLevel />
				<loggerName />

				<pattern>
					<pattern>
						{
						"serviceName": "account-service"
						}
					</pattern>
				</pattern>

				<threadName />
				<message />
				<logstashMarkers />
				<stackTrace />
			</providers>
		</encoder>
	</appender>

The configuration of Spring Cloud Sleuth is very simple. We only have to add spring-cloud-starter-sleuth dependency to pom.xml and declare sampler @Bean . In the sample I declared AlwaysSampler that exports every span, but there is also an other other option РPercentageBasedSampler that samples a fixed fraction of spans.

	@Bean
	public AlwaysSampler defaultSampler() {
	  return new AlwaysSampler();
	}

After starting ELK docker containers we need to run our microservices. There are 5 Spring Boot applications which need to be run discovery-service, account-service, customer-service, gateway-service and zipkin-service. After launching all of them we can try call some services, for example http://localhost:8765/api/customer/customers/{id}, which causes calling of both customer and account service. All logs will be stored in elasticsearch with micro-%{serviceName} index. They can be searched in Kibana with micro-* index pattern. Index patterns are created in Kibana under section Management -> Index patterns. Kibana is available under address http://192.168.99.100:5601. After first running we will be prompt for index pattern, so let’s type micro-*. Under Discover section we can take o look on all logs matched typed pattern with timeline visualization.

kibana2

Kibana is rather intuitive and user friendly tool. I will not describe in the details how to use Kibana, because you can easily find it out by yourself reading a documentation or just clicking UI. The most important thing is to be able to search a logs by filtering criteria. In the picture below there is an example of searching logs by X-B3-TraceId field, which is add to the request header by Spring Cloud Sleuth. Sleuth also adds X-B3-SpanId for marking request for single microservice. We can select which field are displayed in the result list – in this sample I selected message and serviceName like you see in the left pane of the picture.

kibana1

Here’s a picture with single request details. It is visible after expanding each log row.

kibana3

Spring Cloud Sleuth also sends statistics to Zipkin. That is another kind of data than is stored in Logstash. These are timing statistics for each request. Zipkin UI is really simple. You can filter the requests by some criteria like time, service name, endpoint name. Here’s picture with same requests which were visualized with Kibana: http://localhost:8765/api/customer/customers/{id}.

zipkin-1

We can always see the details of each request by clicking on it. Then you see the picture similar to visible below. In the beginning, the request has been processed on API gateway. Then gateway discovered customer service on Eureka server and called that service. Customer service also has to discover account service and then call it. In this view you can easily find out which operation is the most time consuming.

zipkin-3

Microservices with Kubernetes and Docker

In one¬†of my previous posts I described an example of continuous delivery configuration for building microservices with Docker and Jenkins. It was a simple¬†configuration where I decided to use only Docker Pipeline Plugin for building and running containers with microservices. That solution had one big disadvantage – we had to link all containers between each other to provide communication between microservices deployed inside those containers. Today I’m going to¬†present you one the smart solution which helps us to avoid that problem – Kubernetes.

Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. It was originally designed by Google. It has many features especially useful for applications running in production like service naming and discovery, load balancing, application health checking, horizontal auto-scaling or rolling updates. There are several important concepts around Kubernetes we should know before going into the sample.

Pod Рthis is basic unit in Kubernetes. It can consists of one or more containers that are guaranteed to be co-located on the host machine and share the same resources. All containers deployed inside pod can see other containers via localhost. Each pod has a unique IP address within the cluster

Service –¬†is a set of pods that work together.¬†By default a service is exposed inside a cluster but it can also be exposed¬†onto an external ¬†IP address outside your cluster. We can expose it using one of four available behaviors:¬†ClusterIP, NodePort, LoadBalancer and ExternalName.

Replication Controller Рit is specific type of Kubernetes controllers. It handles replication and scaling by running a specified number of copies of a pod across the cluster. It is also responsible for pods replacement if the underlying node fails.

Minikube

Configuration of highly available Kubernetes cluster is rather not easy task to perform. Fortunately, there is¬†a tool that makes it easy to run Kubernetes locally – Minikube. It can run¬†a single-node¬†cluster inside a VM, what is really important for developers who want to try it out. The beginning¬†is really easy. For example on Windows, you have to download minikube.exe and kubectl.exe and add them to PATH environment variable. Then you can start it from command line using minikube start command and use almost all of Kubernetes features available by calling kubectl command. ¬†An alternative for command line option is Kubernetes Dashboard. It can be launched by calling minikube dashboard command. We can create, update or delete deployment from UI dashboard, and also list and view a configuration of all pods, services, ingresses, replication controller etc. Here’s Kubernetes Dashboard with the list of deployments for our sample.

kube1

Application

The concept of microservices architecture for our sample is pretty similar to the concept from my article about continuous delivery with Docker and Jenkins which I mentioned in the beginning of that article. We also have account and customer microservices. Customer service is interacting with account service while searching for customer accounts. We do not use gateway (Zuul) and discovery (Eureka) Spring Boot services, because we have such mechanisms available on Kubernetes out of the box. Here’s the picture illustrating the architecture of presented solution. Each microservice’s pod consists of two containers: first with microservice application and second with Mongo database. Account and customer microservices have their own database where all data is stored. Each pod is exposed as a service and can by searched by name on Kubernetes. We also configure Kubernetes Ingress which acts as a gateway for our microservices.

kube_micro

Sample application source code is available on GitHub. It consists of two modules account-service and customer-service. It is based on Spring Boot framework, but doesn’t use any of Spring Cloud projects except Feign client. Here’s dockerfile from account service. We use small openjdk image – alpine. Thanks to that our result image will have about ~120MB instead of ~650MB when using standard openjdk as an base image.

FROM openjdk:alpine
MAINTAINER Piotr Minkowski <piotr.minkowski@gmail.com>
ADD target/account-service.jar account-service.jar
ENTRYPOINT ["java", "-jar", "/account-service.jar"]
EXPOSE 2222

To enable MongoDB support I add spring-boot-starter-data-mongodb dependency to pom.xml. We also have to provide connection data to application.yml and annotate entity class with @Document. The last think is to declare repository interface extending MongoRepository which has basic CRUD methods implemented. We add two custom find methods.

public interface AccountRepository extends MongoRepository<Account, String> {

    public Account findByNumber(String number);
    public List<Account> findByCustomerId(String customerId);

}

In customer service we are going to call API method from account service. Here’s declarative REST client @FeignClient declaration. All the pods with account service are available under the account-service name and default service port – 2222. Such settings are the results of the service configuration on Kubernetes. I will describe it in the next section.

@FeignClient(name = "account-service", url = "http://account-service:2222")
public interface AccountClient {

	@RequestMapping(method = RequestMethod.GET, value = "/accounts/customer/{customerId}")
	List<Account> getAccounts(@PathVariable("customerId") String customerId);

}

The docker image of our microservices can be build with the command visible below. After build you should push that image to official docker hub or your private registry. In the next section I’ll describe how to use them on Kubernetes. Docker images of the described microservices are also available on my Docker Hub public repository as piomin/account-service and piomin/customer-service.

docker build -t piomin/account-service .
docker push piomin/account-service

Kubernetes deployment

You can create deployment on Kubernetes using kubectl run command, Minikube dashboard or JSON configuration files with kubectl create command. I’m going to show you how to create all resources from JSON configuration files, because we need to create multi-containers deployments in one step. Here’s deployment configuration file for account-service. We have to provide deployment name, image name and exposed port. In the replicas property we are setting requested number of created pods.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: account-service
  labels:
    run: account-service
spec:
  replicas: 1
  template:
    metadata:
      labels:
        run: account-service
    spec:
      containers:
      - name: account-service
        image: piomin/account-service
        ports:
        - containerPort: 2222
          protocol: TCP
      - name: mongo
        image: library/mongo
        ports:
        - containerPort: 27017
          protocol: TCP

We are creating new deployment by running command below. The same command is used for creating services and ingress. Only JSON file format is different.

kubectl create -f deployment-account.json

Now, let’s take o look on service configuration file. We have already created deployment. As you could see in the dashboard image has been pulled from Docker Hub, pod and replica set has been created. Now, we would like to expose our microservice outside. That’s why service is needed. We are also exposing Mongo database on its default port, to be able to connect database and create collections from MongoDB client.

kind: Service
apiVersion: v1
metadata:
  name: account-service
spec:
  selector:
    run: account-service
  ports:
    - name: port1
protocol: TCP
      port: 2222
      targetPort: 2222
    - name: port2
protocol: TCP
      port: 27017
      targetPort: 27017
  type: NodePort

kube-2

After creating similar configuration for customer service we have our microservices exposed. Inside kubernetes they are visible on default ports (2222 and 3333) and service name. That’s why inside customer service REST client (@FeignClient) we declared URL http://account-service:2222. No matter how many pods have been created service will always be available on that URL and requests are load balanced between all pods be Kubernetes out of the box. If we would like to access each service outside Kubernetes, for example in the web browser we need to call it with port visible below container default port – in that sample for account service it is 31638 port and for customer service 31171 port. If you have ran Minikube on Windows your Kubernetes is probably available under 192.168.99.100 address, so you could try to call account service using URL http://192.168.99.100:31638/accounts. Before such test you need to create collection on Mongo database and user micro/micro which is set for that service inside application.yml.

kube-3

Ok, we have our two microservices available under two different ports. It is not exactly what we need. We need some kind of gateway available under on IP which proxies our requests to exact service by matching request path. Fortunately, such an option is also available on Kubernetes. This solution is Ingress. Here’s JSON ingress configuration file. There are two rules defined, first for account-service and second for customer service. Our gateway is available under micro.all host name and default¬†HTTP port.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: gateway-ingress
spec:
  backend:
    serviceName: default-http-backend
    servicePort: 80
  rules:
  - host: micro.all
    http:
      paths:
      - path: /account
        backend:
          serviceName: account-service
          servicePort: 2222
      - path: /customer
        backend:
          serviceName: customer-service
          servicePort: 3333

The last thing that needs to be done to make the gateway working is to add following entry to system hosts file (/etc/hosts for linux and C:\Windows\System32\drivers\etc\hosts for windows). Now, you could try to call from your web browser http://micro.all/accounts or http://micro.all/customers/{id}, which also calls account service in the background.

[MINIKUBE_IP] micro.all

Conclusion

Kubernetes is a great tool for microservices clustering and orchestration. It is still relatively new solution under active development. It can be used together with Spring Boot stack or as an alternative for Spring Cloud Netflix OSS, which seems to be the most popular solution for microservices now.¬† It has also UI dashboard where you can manage and monitor all resources. Production grade configuration is probably more complicated than single host development configuration with Minikube, but I don’t that it is solid argument against Kubernetes.

Advanced Microservices Security with OAuth2

In one of my previous posts I described the basic sample illustrating microservices security with Spring Security and OAuth2. You could read there how to create and use authorization and resource server, basic authentication and bearer token with Spring Boot. Now, I would like to introduce more advanced sample with SSO OAuth2 behind Zuul gateway. Architecture of newest sample is rather similar to the previous sample like you can see in the picture below. The difference is in implementation details.

oauth2

Requests to the microservices and authorization server are proxied by the gateway. First request is redirected to the login page. We need to authenticate. User authentication data is stored in MySQL database. After login there is also stored user HTTP session data using Spring Session library. Then you should to perform next steps to obtain OAuth2 authorization token by calling authorization server enpoints via gateway. Finally, you can call concrete microservice providing OAuth2 token as a bearer in Authorization HTTP request header.

If you are interested in technical details of the presented solution you can read my article on DZone. There is also available sample application source code on GitHub.