Vitria
Vitria

Case Study: Learn Some Lessons From TXU Energy's Operational Intelligence System

The knowledge gained by TXU Energy when it implemented its process-monitoring system is applicable to organizations that need to improve the effectiveness and service levels of complex, end-to-end processes that encompass heterogeneous business applications.

Overview

This research explains how real-time operational intelligence is used to improve customer service for complex end-to-end business processes in a virtual enterprise. IT leaders, business leaders and business analysts must understand best practices to minimize their risk and achieve rapid payback.

Key Findings
  • End-to-end operational intelligence reveals problems that cannot be detected by silo management reports and monitoring systems that track only individual applications.
  • Operational intelligence does not require a workflow or process orchestration tool to actively drive the end-to-end process. The event data required for end-to-end monitoring can be captured from database management systems (DBMSs) or other aspects of the business applications.
  • Significant business events typically occur when a transaction crosses from one application to another, from one company to another, or when a person finishes work on a task. In most cases, business people do not need or want to see fine-grained details on the minor events that happen within application systems.

Recommendations
  • Apply real-time operational intelligence at two levels: (1) to correct problems that affect individual process instances (e.g., invalid data in a customer service request) and (2) to correct problems that affect large groups of process instances (e.g., when an entire application system is down).
  • Monitor end-to-end business processes using a coarse-grained, summary process model. In most cases, no more than seven to 10 significant milestones for each process should be shown on a business dashboard.
  • Develop operational intelligence applications in an iterative manner, refining the business dashboards, key performance indicators (KPIs), alert thresholds and problem resolution processes as you gain experience with the system.

Analysis
What You Need to Know

Real-time operational intelligence systems enable timely action that improves the efficiency and effectiveness of company operations. They provide one or more capabilities, such as business activity monitoring (BAM), advanced analytics, other kinds of decision management and process orchestration to resolve exceptions. BAM is a subset of operational intelligence; in a broad sense, it includes all business-oriented monitoring that provides current information on the conditions and changes that happen in a company and its external environment. However, the term "BAM" is usually applied only to business process monitoring (such as what occurs in this operational intelligence application) or to monitoring applications that lack a specific functional label, such as supply chain visibility or fraud detection.

BAM has become relatively common for individual application systems and for business processes that are orchestrated at runtime by workflow or business process orchestration tools. However, BAM is still relatively uncommon for processes that encompass multiple heterogeneous application systems, especially if they are not orchestrated by a runtime business process monitoring software tool. Most business processes fit into this category — that is, they run "in the dark," because they don't have end-to-end process monitoring.

The TXU Energy Case Study demonstrates how a small development team can quickly implement a process monitoring solution that gives business people much-improved visibility into complicated business processes, leading to substantial improvements in customer service levels and customer satisfaction.

Introduction

The Texas electricity market is deregulated, so generating and transmitting electricity are separate from retail delivery. Multiple parties are involved, including:

  • Consumers
  • Retail electricity suppliers, such as TXU, that provide customer service, arrange provisioning and handle billing
  • Texas Transmission and Distribution Service Providers (TDSPs) that provide physical support (poles and wires) and read meters
  • Companies that generate and sell bulk electricity
  • The Electric Reliability Council of Texas (ERCOT), the independent system operator for 75% of Texas; ERCOT runs a hub that connects retail electricity suppliers and consumers, it brokers the communication when consumers enroll or switch to a new provider, and it manages the physical flow of electric power and the settlement of wholesale bulk-power transactions

TXU has more than 2 million customers, making it the largest retail supplier in Texas. It offers consumer power plans and electricity bill payment assistance programs. The success of a retail electricity supplier depends on its ability to manage the flow of work among the participants in the complex provisioning and delivery processes. Customers can switch to another supplier if they are dissatisfied. As Kevin Chase, TXU's CIO, explained, "Customer service is crucial to maintaining our position as the leading energy company in the highly competitive Texas market." TXU's goal is to process 100% of customers' requests on the day promised.

The Challenge

TXU's goal was to improve customer service and reduce customer churn by implementing a new real-time operational intelligence system (sometimes called BAM, process intelligence, enterprise workflow monitoring or operational event workflow analysis). The new system was intended to provide visibility into its end-to-end processes, improve the process flow, eliminate bottlenecks that result in delays and make it possible to take immediate corrective action for emerging problems. It focused on two key business processes.

Customer Onboarding (Enrollment and Move-In)

TXU receives thousands of new service requests per day. At any given time, the system must monitor the status of all outstanding onboarding requests that are at some stage in their life cycle. Some are provisioned on the same day that the request is made; others may take several days.

Disconnect/Reconnect

TXU also handles thousands of events per day related to disconnects and reconnects for people who have exceeded the 16-day payment period. Regulations require that reconnects that can be handled remotely in automated meters may be accomplished in a minimum of two hours of payment being received. Other service-level regulations govern customer reconnects that require on-site physical work.

Applications

TXU uses four primary application systems to support these processes:

  • SAP CRM for customer-facing transactions
  • SAP-ISU for billing, invoicing, payments and collections
  • GSX's Inovis BizLink for sending and receiving electronic data interchange (EDI) transactions with ERCOT
  • GSX's TrustedLink Enterprise (TLE) for translating documents from SAP to EDI format and vice versa

The flow of work among these applications is not orchestrated by a runtime workflow or process management tool. Furthermore, prior to this project, each application was managed separately using the respective reporting systems. Local process models were available for some of the applications. Metrics and exceptions for some systems could be seen by people in different departments, but there was no end-to-end view. The management reports did not cross the application silos and did not cover some metrics of interest to the business.

The TXU operational business unit charged with monitoring these processes, the Transaction Management team, had no centralized way of finding where a particular customer's enrollment was in the process. If a customer's enrollment was stalled, the company sometimes only learned of the problem when the person called in to complain. The team would then have to implement an exception process, and track down the origins of the problem. This could take several days of investigation.

Approach

TXU first developed a prototype end-to-end BAM application using custom, homegrown code. The system had high overhead, did not refresh frequently enough and did not provide all of the KPIs that the Transaction Management team required. TXU also considered using the SAP Solution Manager product, but determined that this tool was better-suited to system monitoring than tracking business KPIs and end-to-end processes. TXU then turned to Vitria Technology's M3O Operational Intelligence Platform to provide the software technology infrastructure for the project. A consulting company, Sendero Business Services, provided management consulting and one application developer. One Vitria architect was also brought in to help implement the solution, giving the project a total of two software developers and one management consultant.

The first step in the project was to build end-to-end shadow tracking process models that described the flow of work across the applications. The shadow process models are used for BAM tracking purposes, but they are not used to control (orchestrate) the actual flow. The developers interviewed business people (subject matter experts) to get a general understanding of the major business events and overall flow at a conceptual level. Then they examined the application systems to determine the actual software transactions at a more-detailed logical level. In some cases, the exact sequences of activities in software were different than the processes as they were understood by business users.

The business events of interest only reflect major milestones, such as when a service request is submitted, a transaction passes from one application to another, or goes to or from another company, such as ERCOT or a TDSP. The business processes in this case study only had nine major milestones (each) that needed to be tracked. The shadow process models were expressed in M3O BPM, the Business Process Modeling Notation (BPMN)-based process modeling component of M3O. The same tool is also used in a slightly different way to orchestrate active business processes that carry out problem resolution after a problem has been detected.

The second step was to implement the connections into the four application systems. Business events are captured at runtime by polling the Oracle databases in the application systems every one to three minutes. Database rows with new or changed data are selected based on timestamps that are automatically generated by Oracle. One of the most complicated parts of the development project was to understand the internal hex identifiers that SAP uses to store data in key fields. By getting the event data from the Oracle DBMS, the SAP applications required only one small modification. The senior manager of operations development, who was responsible for the project from the business side, reported that the overhead of capturing the events from the SAP application databases is unnoticeable, because event data is only collected for a few major business events.

The third and final step was to develop the real-time intelligence application, including the BAM business dashboard and problem resolution processes. The business rules that represent regulations, service-level agreements (SLAs) and TXU policies were provided by the senior manager of operations development and members of the Transaction Management team. The policies were implemented in software on the M3O platform. The dashboard was implemented using the drag-and-drop Visual Builder tool in the Flex-based M3O Operations component.

Runtime

The Vitria M3O Feed Server software component ingests business events from the application databases, converts them to XML, archives a copy in an event log (another Oracle database), and forwards the events to the M3O Analytic Server for processing. The overall volume of events is relatively modest — on average, fewer than 50 events per minute. Thousands of enrollment and disconnect/reconnect service requests are open at any one time, and each can have as many as 200 business events archived in the event log.

The M3O Analytic Server runs the operational intelligence logic. It uses its internal complex-event processing (CEP) engine to find meaningful patterns of business events that constitute incidents, which are complex events that represent exceptions or other situations of interest. The Analytic Server also enriches events by inserting account or transaction identifiers to match business events with customer service requests. The Analytic Server sends incidents and other events to the M3O Business Process Management (BPM) system.

The M3O BPM engine implements the tracking processes at runtime. It matches each incident to the history of its associated service request to see if the event occurred within an acceptable time window. Tracking processes are also used to identify the absence of events — situations in which an event that should have occurred didn't happen during the time window set by a time-based SLA or a TXU policy. For example, the system determines if a transaction was sent to ERCOT but a response was not received, or if a transaction was received from ERCOT, but a TXU SAP application did not process it in a timely fashion. If a time threshold is exceeded, the BPM engine creates a case for a person in the Transaction Management team to resolve the problem. At this point, M3O BPM is actively orchestrating a resolution process by triggering a sequence of activity (it is no longer just monitoring a shadow process).

M3O has a relatively unique two-way integration between M3O Analytic Server and the BPM components. Some events from the BPM engine are fed back into the Analytic Server, so that it can generate KPIs that reflect aggregate statistics. These are used for monitoring the overall performance and SLAs of the business processes, which are displayed on the business dashboards, complementing the analysis of individual service requests (process instances) previously described. As in most operational intelligence systems, problem resolution occurs on two levels: the individual process instances (customer service requests) and the process at an aggregate level (a whole set of requests).

The system provides information to help Transaction Management team members determine why a service request didn't proceed within the expected time. The team member can optionally browse through the history of events for the case or replay the events to see what happened. For example, the person who entered the service request might have used an invalid product code to enroll a customer, so the transaction did not advance through the process. The system will send an alert and, if it is a high-priority enrollment, the system will also send an email to notify a watch officer to resolve the problem. The system also tracks compliance with regulatory SLAs, flagging reconnect requests that are not fulfilled within the allowed window.

The operational intelligence system has about 30 users. The overall flow of service requests is graphically depicted on the dashboard, along with numeric counts of the number of transactions at each stage that have been started, found to have errors, or have been warned, held, canceled and completed (refer to the top of Figure 1). Aggregate-level problems in the overall process are reflected in green-blue-yellow-red "traffic light" indicators. For example, if the GSX Inovis BizLink or TLE applications go down, hundreds of service requests are affected, and red lights will quickly appear. Again, email alerts are sent to several business managers, in addition to the dashboard alerts. Details of each service request are displayed at the bottom of the dashboard. As part of the problem resolution process, the system can also generate an Excel spreadsheet to send to TDSPs to get the power turned on, bypassing the standard notification process to speed up service.

Figure 1. Business Dashboard for Enrollment Process
Figure 1.  Business Dashboard for Enrollment Process
Source: Gartner (September 2011)


Results

The implementation of this system was notably fast. Two developers built a limited proof-of-concept (POC) customer enrollment tracking application and put it through the IT quality assurance process in four weeks. It went into pilot production for one day, so that the business users could try it. However, users liked it so much that it remained in use until the full application went live four months later on 28 June 2010. The POC monitored five major milestones in the enrollment process; the full end-to-end system covers nine milestones and does more to manage problem resolution processes. The second application to monitor the disconnect/reconnect process went live on 23 October 2010. The whole system, monitoring both processes, required two software developers for eight months (not all full-time), plus part-time involvement from a TXU senior manager and a management consultant from Sendero.

In the first few weeks of operation, the new system identified certain sets of customer enrollment requests that took a long time to get through the process, because of missing data or invalid product codes. Some invalid requests were rejected by the SAP CRM application, and then were manually corrected by the Transaction Management team. Other requests were stalled without being noticed. The new operational intelligence system made the patterns of rejections and stalled transactions visible. Once the problem was understood, the enrollment process in the SAP CRM application was improved, and the overall level of problems dropped significantly.

The time to resolve ongoing, individual, one-off enrollment errors has also dropped significantly — "from hours to minutes," according to the TXU senior manager of operations development. Customer satisfaction ratings are at an all-time high, partly due to this operational intelligence system. TXU also implemented new staff training programs, clarified some of its products and made other improvements, so the contribution of the operational intelligence system cannot be determined precisely.

The system informs the Transaction Management team that an application is down, even before the IT department knows that the application is down. Although this is fundamentally a BAM system, the IT department asked for (and received) access to the system to complement the IT operations management tools that it uses to manage its systems.

Critical Success Factors

This system was installed without disrupting the business applications that remained in operation throughout the project. Operational intelligence is an overlay with almost no change to the legacy applications and no noticeable impact on the performance of the business applications.

The outside consultants from Sendero had worked with TXU for years before this project, so they understood the business processes well.

The senior manager of operations development, who was responsible for the project from the business side, was familiar with the general nature of real-time operational intelligence and also understood the business requirements. This was important because, as is common in such projects, the ultimate users of the system in the Transaction Management team were focused on their regular duties and had limited time to spend providing ideas and specifications for the new system.

Administrators from the Transaction Management team are empowered to adjust some of the alerting thresholds on an ongoing basis. For example, they can adjust the number of minutes of a delay before a yellow warning or red alarm is signaled. This makes the system more flexible and increases their sense of ownership of the system.

The software vendor, Vitria, was directly involved in the project and was willing to develop some custom product extensions. M3O was relatively new when this project was implemented in mid-2010. Vitria added features such as traffic light display widgets, a JavaScript extension to send email alerts, a new timer based on fixed time and date (its timers previously counted only elapsed times), and an easier way to implement resolution processes that looped back and waited for something to change in the external environment.

Development was relatively simple because M3O provides sophisticated CEP, process monitoring, process orchestration and BAM visualization capabilities in one product with a common development studio. These components are integrated, so they exchange event data efficiently without the need for adapters or custom coding. TXU, Vitria and Sendero had a positive collaborative relationship that contributed to the speed of implementation and the success of the project.

Lessons Learned

End-to-end process monitoring reveals problems that cannot be detected by silo management reports and monitoring systems that track only individual application systems. End-to-end operational intelligence is highly relevant in a virtual enterprise (B2B) scenario such as this, where a process involves other companies or industry clearing houses.

Process analysts and business managers do not need to understand all the possible exceptions or problems that are occurring in their company before an operational intelligence system is deployed. An operational intelligence system reflects actual outcomes, such as the average elapsed time to transition through a business process or the number of transactions hung up at a particular stage. Thus, operational intelligence can reveal patterns of problems even before the causes are clear.

BAM does not require a workflow or process orchestration tool to actively drive the end-to-end process. The event data needed for end-to-end monitoring can be captured from DBMSs or other aspects of the individual applications. Operational intelligence systems provide value by finding and correcting problems that affect individual process instances (for example, a data entry error) and those that affect large groups of process instances (for example, a whole application system is down).

An end-to-end business process should be monitored at a coarse-grained, conceptual level. No more than seven to 10 significant milestones for each process should be shown on a business dashboard in most cases. Significant business events typically occur when a process crosses from one application system to another, from one company to another, or when a person finishes work on a task (for example, approves a customer transaction). In a few cases, a major milestone within an application system is significant enough to include in the process model. However, business people generally do not need or want to see fine-grained detail on the many minor events that happen within application systems. Systems that monitor at a detailed level are also more brittle, because they may have to be changed whenever the underlying applications are modified in small ways.

The process of developing a successful operational intelligence application is iterative. In this case study, the KPIs and the problem resolution processes were improved in several stages from the time that the initial POC system went into production, as users became more familiar with the system.

Source: Gartner Research G00219088, Roy Schulte, 21 September 2011