Big Data and Risk Management: Cracking the Code

Monday, July 02, 2012 , By Amir Halfon

printPrint   |  Order  |  Email this Story  | 

In the financial services industry, risk management and Big Data -- the popular buzzword for enormous sets of structured and unstructured data that institutions are facing as their enterprises go digital -- go hand-in-hand. One simply cannot separate the two and must address the opportunities and challenges of each in tandem.

Fueled by the financial crisis of 2008 and ongoing uncertainty in Europe, regulatory bodies and, by extension, the industry are focused like never before on identifying, measuring, and managing risk exposure across asset classes, lines of business, and enterprises. Managing large amounts of data (including positions, reference, market data, etc.) is a critical component in the accurate assessment of risk, and is one of the reasons Big Data management has recently ascended to top-of-mind status among C-level executives and regulators. While prudent financial services organizations recognize this strategic shift, many are left wondering how to leverage the value of growing amounts of data.

To date, most Big Data discussions have focused on web-based companies such as Google and Facebook and the large amounts of unstructured data they generate. There's been a lot of attention to harnessing that data for commercial goals, and certainly the banking industry is examining these possibilities. One could argue, however, that the more urgent task is that of harnessing the value of data generated and collected and applying insights to address critical business concerns such as risk management.

This article will focus on how Big Data is transforming the industry, the different components that comprise Big Data, and various technology strategies financial organizations can utilize to manage this transformation efficiently and with a focus on innovation.

Big Data has many definitions, but key components can be categorized around the four Vs: volume, velocity, variety and value.

Handling Large Volumes

The web is becoming the world's central data store, and as such provides a rich source of information on everything from public sentiment to customer behavior and market intelligence. The web is not the only place seeing explosive growth in data volumes. Our industry has witnessed exponential growth in trade data, beginning with electronic markets and skyrocketing with market fragmentation and the widespread use of algorithmic, program and high-frequency trading. Increased volumes also mean there are much larger amounts of historical tick and positions data that need to be analyzed. New regulations require ever more extensive data retention and analysis, and sophisticated strategy development requires growing amounts of historical data for back testing.

Many systems are struggling to keep up with these volumes of data while still performing primary or business-critical tasks. The challenge financial services organizations are facing now is strategizing how to keep up with the sheer quantity of data generated on a continuous basis.

The most relevant technical strategy to manage growing data volumes is parallelism. While we have been spending a lot of effort parallelizing computation, data parallelism remains a challenge and is the focal point of most current IT projects. Additionally, it is becoming apparent in many cases that compute grids are becoming bottlenecks for data access. As a result, the pattern of moving computing tasks to the data, rather than moving large amounts of data over the network, is becoming increasingly prevalent.

Several technical approaches combine these strategies, parallelizing both data management and computation, while bringing compute tasks close to the data:

Engineered machines integrate software and hardware mechanisms, combining data and compute parallelization with partitioning, compression, and a high-bandwidth backplane to provide very high throughput for data processing while minimizing data movement.

Integrated analytics also involves moving computation to the data rather than the other way around. Whether it's OLAP (online analytical processing), predictive or statistical analytics, modern databases are capable of doing a lot of computation right where the data is stored.

Data grids focus on maximizing data parallelism by distributing in-memory data objects across a large, horizontally scaled cluster, and some even provide the ability to ship compute tasks to the nodes holding the data in memory, rather than sending data to compute nodes as most grids do.

NoSQL, or schema-less data management, has been gaining momentum. At its core is the notion that developers can be more productive by circumventing the need for complex schema design during the development lifecycle of data-intensive applications, especially when the data lends itself to key-value modeling (e.g. time series data).

Hadoop is a complete open-source stack for storing and analyzing massive amounts of data, and is quickly becoming a de facto standard, with multiple distributions available. Like the technologies mentioned above, the Hadoop framework achieves massive scalability by sending compute tasks to the nodes storing the data, and a rich ecosystem of analytical tools offers high level functionality on top of that.

1 | 2 | 3 Next Page ►

Risk Management e-Journal
The Risk Management e-Journal publishes paper abstracts on the topics that matter most to risk professionals. See what your risk manager colleagues are reading about today.




Get Free Updates on the Dodd-Frank Act
Register for Morrison & Foerster's FrankNDodd service to receive Daily News Alerts on the Dodd-Frank Act, gain access to regulatory highlights and commentary, and use the exclusive FrankNDodd Tracker tool.