One of the most challenging data management problems faced in the industry is with seismic files. Some oil and gas companies estimate that they acquire a petabyte of data per day or more. Domain knowledge and specific approaches are required to move, access, and visualize that data.
In this blog post, we will dive deep into the details of modern technology that can be useful to achieve speed up. We will also cover: common challenges around seismic visualization, how INT helps solve these challenges with advanced compression and decompression techniques, how INT uses vectorization to speed up compression, and more.
What Is IVAAP?
IVAAP is a data visualization platform that accelerates the delivery of cloud-enabled geoscience, drilling, and production solutions.
- IVAAP Client offers flexible dashboards, 2D & 3D widgets, sessions, and templates
- IVAAP Server side connects to multiple data sources, integrates with your workflows, and offers real-time services
- IVAAP Admin client manages user access and projects
Server – Client Interaction
Interaction occurs when the client requests a file to display from the server, the server returns the file lists, the user chooses a file to display, and then the server starts sending chunks of data while it displays this data.
Some issues encountered with this scheme include:
- Seismic data files are huge in size — they can be hundreds of gigabytes or even terabytes.
- Because of the file size, it takes too much time to transfer files via network.
- The network can have too much bandwidth.
The goals of this scheme are to:
- Speed up file transfer time
- Reduce data size for transfer
- Add user controls for different network bandwidth
And the solution:
- We decided to implement server-side compression and client-side decompression. We also decided to provide the client parameter that we call acceptable error level after the seismic data file compression/decompression process.
By taking a closer look at compression and decompression data, we can see that the original seismic data goes through a set of five transformations — AGC, Normalization, Hear Wavelets, Quantization, and Huffman. As a result of this transformation, we get a compressed file that can be sent to clients via network. And on the client’s side, there is a decompression process that goes in different directions — from inverse Huffman to inverse AGC. This is the way that clients get original data. It does not get precise, original data. But it gets data after the compression and decompression process. That’s why we added an acceptable error level after the compression and decompression process. This is because we have different scenarios where clients don’t always require the full original data with the full level of precision. For example, sometimes the client only needs to review the seismic data. So using this acceptable error level, they can control how much data will be passed by a network and, of course, speed up this process.
The resulting scheme looks like this:
The client requests a file list from the server, the user chooses a file to display, and then the server starts sending the data and compresses it. The server then sends it to the client, the client decompresses, and finally, it displays the data. This is repeated for each tile to display.
So why not use any other existing compression, like GZIP, LZ Deflate, etc.? We tried these compressions, but we found out that this type of compression is not as effective as we’d like it to be on our seismic data.
Server-Side Interaction
The primary objective was to speed up the current implementation of compression and decompression on both the server and client side.
The proposal:
- Server-side compression is implemented in Java, so we decided to create C++ implementation of compression sequence and use JNI layer to call native methods. For the client-side decompression, we implemented in JavaScript to create C++ implementation of decompression and use WebAssembly (WASM) proposal for integrating C++ code into JS.
- We implemented both compression and decompression algorithms in C++, but after comparing the results and performance of C++ and Java, we discovered that C++ was just 1.5 times faster than “warmed up JVM”. That’s why we decided to move on and apply SIMD instructions for further speedup.
Single Instruction Multiple Data (SIMD)
SIMD architecture performs the same operation on multiple data elements in parallel. For Scalar operation, you have to perform four separate calculations to get the right result. For SIMD operations, you apply one vector value calculation to get the correct result.
SIMD benefits:
- Allows processing of several data values with one single instruction.
- Much faster computation on predefined computation patterns.
SIMD drawbacks:
- SIMD operations cannot be used to process multiple data in different ways.
- SIMD operations can only be applied to predefined processing patterns with independent data handling.
Normalization: C++ scalar implementation
Normalization: C++ SIMD SSE implementation
Server-Side Speedup Results
There are different types of speedup for different algorithms:
- Normalization is 9 times faster than the scalar C++ version
- Haar Wavelets is 6 times faster than the scalar C++ version
- Huffman has no performance increase (not vectorizable algorithm)
Overall, the server-side compression performance improvement is around 3 times faster than the Java version. This is applying SIMD C++ code. This was good for us, so we decided to move on to the client-side speedup.
Client-Side Speedup
For the client-side speedup, we implemented decompression algorithms in C++ and used WASM to integrate the C++ code in JavaScript.
WebAssembly
WASM is:
- A binary executable format that can run in browsers
- A low-level virtual machine
- A high-level language compile result
WASM is not:
- A programming language
- Connected to the web and cannot be run outside the web
Steps to get WASM working:
- Compile C/C++ code with Emscripten to obtain a WASM binary
- Bind WASM binary to the page using a JavaScript “glue code”
- Run app and let the browser instantiate the WASM module, the memory, and the table of references. Once that is done, the WebApp is fully operative.
C++ Code to Integrate (TaperFilter.h/cpp)
Emscripten Bindings
WebAssembly Integration Example
Client-Side Speedup Takeaways:
- Emscripten supports the WebAssembly SIMD proposal
- Vectorized code will be executed by browsers
- The results of vectorization for decompression algorithm are:
- Inv Normalization: 6 times speedup
- Inv Haar Wavelets: 10 times speedup
- Inv Huffman: no performance improvement (not vectorizable)
Overall, the client-side decompression performance improvement with vectorized C++ code was around 6 times faster than the JavaScript version.
For more information on GeoToolkit, please visit int.com/geotoolkit/ or check out our webinar, “How to Get the Best Performance of Your Seismic Web Applications.”