From data chaos to clarity: the fleet report fix.

7 mins. 7 mins.
In today’s fast-paced world, fleet efficiency is more critical than ever. At Volvo Group Connected Solutions, me and my product team are working in the forefront of delivering innovative solutions to enhance fuel and energy efficiency across vehicle fleets. This includes insightful reports, customizable dashboards, and automated reporting features. In this article I will go through some of the technical challenges we faced, the strategic steps we took to overcome them, and the remarkable outcomes we achieved.
Improving reporting performance for fleet efficiency

What we do

My product team comprising of Business Analysts, Area Architect, Domain Architects, Product Owner, Developers and Testers, is part of a product domain called Fuel and Environment. We empower fleet owners by delivering insightful reports, facilitating a deeper understanding of fuel and energy efficiency across their vehicle fleet. Our comprehensive analytics cover all Volvo Group owned vehicles, and we offer customizable dashboards, allowing customers to segment data by individual drivers and vehicles. Fleet owners can also analyze specific vehicle metrics through both tabular and graphical representations. Furthermore, we provide an automated reporting feature, enabling customers to receive consolidated performance summaries on a weekly or monthly basis for streamlined decision-making.

 

A few main use-case of reports

  • Month end reporting for salary and finance keeping
  • Weekly review of all drives
  • Yearly review of vehicle performance, comparing vehicle data across years.

 

Detailed insight into business specifics

Within the Fuel and Environment service we manage a substantial volume of vehicles. Approximately 300.000, out of 1.7 million, connected vehicles are reporting data to us which means that we receive around 21,000 messages per second. This data must be stored efficiently to meet the diverse needs of the end users. Our service caters to all countries, which necessitates a 24-hour availability to accommodate global requirements

Our service also provides automated mechanisms, allowing fleet owners to receive their reports at predefined, scheduled intervals. Their fleets are of diverse sizes (ranging from small fleets of < 20 to large fleets sizes of > 1000 connected vehicles) and most reports are scheduled for end of the month and/or week. This challenges our service to be efficient in handling high request volumes, asynchronously approximately 1000 request per second.

Our system offers the flexibility of configuring scheduled reports, which places a technical responsibility on us to ensure that these reports are generated, before business hours in specific countries. We distribute approximately 22k reports every month, and 12k reports each week as part of our service.

 

Challenges

Customers faced challenges with report delivery, particularly slow rendering times and delays in scheduled distributions. These issues hinder timely access to critical information, affecting decision-making processes. Addressing these performance bottlenecks is crucial to improve the user experience and ensure prompt and efficient report deliver.

Our infrastructure has been strained due to heavy database resource utilization, with high read/write IOPS and maxed-out CPU usage. This poses a significant risk of database crashes, potentially leading to total disruption.

 

How did we address these challenges

Quick overview of our tech stack

Our service is developed using the latest version of Spring Boot, which handles most of the business logic, while the UI layer was built on a React.js portal framework. We host all our applications on AWS, using EC2 instances, and our database layer is Oracle 21. We chose Oracle for its ability to manage high data traffic and large volumes of data.

 

Analysis that we did

Before going into fixing the issue we did a deep analysis on the root cause, as well as an e2e evaluation of all bottlenecks that was impacting performance degradation.

 

Information Model
The information model was not aligned to the access pattern, so we did an analysis on what are the different access models. Below are a few points used as input for this analysis

  • What data is queried the most
  • What were the common filters used (date range, asset)
  • Understand Usage frequency of the functionality (daily, weekly, monthly)

 

Business logic

How is the code aligned to support report rendering?

  • Are we having too many unwanted calls to database?
  • Are we using the right and limited parameters in the queries?
  • Do we have the right indexing for our tables?
  • How are we handling the data fetch before sending it back to UI?

 

User Interface

  • Do we have more data sent to the UI than required?
  • Are we trying to load all data at once?

 

Steps towards success:

Our primary challenge was that the information model didn’t align with our access patterns, leading to inefficiencies and bottlenecks. Fixing this issue was extremely challenging and complicated as it required extensive modifications to the data model. Given the increasing volume of incoming traffic, we knew we needed to make these changes carefully and strategically. We decided to tackle the problem in a stepwise manner, allowing us to implement adjustments gradually while minimizing disruption. This approach not only made the process more manageable but also ensured that we could continuously monitor performance improvements along the way.

 

Step-by-step approach simple but expensive

  • Our primary goal was to minimize disruption to our customers’ businesses, so we opted for enhancing our infrastructure to handle the increased load. Through careful monitoring, we identified that read IOPS were the most affected. Since we were using a cloud solution, scaling up was straightforward. To ensure minimal to no interruptions in customer services, we increased the infrastructure capacity while simultaneously analyzing the system to pinpoint the core bottlenecks.

 

Dig deeper into information model:

It’s crucial to regularly review the access patterns observed in production. These reviews help us determine whether the data is stored appropriately for its intended use.

  • Normalization is good but too many foreign key association should be avoided — storing data as close as needed is a good option. This could potentially result in data being repeated but with good data structuring and right index this can be well managed. We identified a few such associations based on the frequency of access and thus made changes in the queries. This also meant that the data migration needed specific handling.
  • Long running queries: Examining the access pattern showed presence of skewed indexes, i.e. unevenly distributed data. In this specific case we had a column that was part of index which was heavily skewed. Since this value was crucial in most of the access pattern, this issue needed to be addressed. We redesigned the data based on access patterns (most of our reports are focused on daily usage of fuel parameters), but we also needed to accommodate reports based on driver sessions. To address this, we separated the data into different tables and tagged them for the two distinct types of retrieval. Then we needed to revise all query implementations and develop scripts to segregate five years’ worth of data to align with the new structure.
  • Frequency of Access: Our monitoring revealed that most reports were requested at the end of the week and month. While we had aggregated data for these periods, we realized we weren’t using that cached data effectively. We identified the need to split the date range requests into daily, weekly and monthly aggregates in order to efficiently stack the different periods for longer time ranges. For a request spanning one year, we first evaluated the relevant months, then the appropriate weeks, and fetched the remainder of the data on a daily basis.

 

The Client layer

  • After the improvements from remodeling the information model, the next step was to address the UI, which was choked because more data was returned than required. Usage of technology like Graphql to reduce network traffic and client complexity lead to massive improvements.

 

Rolling it out to production

After identifying the problem, finding an optimal solution was straightforward. The real challenge was determining how to implement it in production. We migrated the data from the old structure to the new one. We launched this feature gradually, rolling it out on a customer-by-customer basis. This approach not only allowed us to assess performance improvements but also ensured a thorough quality check of the rendered data.

 

Results

What did we gain after all the changes?

· Customers could fetch 5 year reports.

· Better utilized database resource which helped in scaling down and saving cost.

Bio: Vinitha Rajagopal
I am a Senior Solution Architect working at Volvo Group Connected Solutions, currently focused on the Charging product area. I’ve been with this incredible company for 11 years, and I’m constantly amazed by how our products (trucks and buses) are more intelligent than the smartest phones. This has only deepened my passion for IoT. As an architect, I believe that business and technology should go hand in hand, and at Volvo Group, I get to do exactly that. Outside of work, I enjoy trekking, and being in the Himalayas is always a rejuvenating experience.

 

Connect with Vinitha on LinkedIn