CLIKA Case Studies — Real Results

Blog Image

Bridging Model Compression and Edge Deployment

Blog Image

Realtime, on-device AI capabilities for patrol operations.

Blog Image

National-scale CCTV intelligence demands low-latency, low-cost deployment.

Questions?
We're here to assist!

How does CLIKA compression work?

The Automatic Compression Engine (ACE) SDK functions like a universal compiler, optimizer, and translator for all AI models, targeting every major hardware backend. ACE automatically generates a unique compression plan for every model. By analyzing the model's architecture alone, the software identifies and applies customized optimizations specific to that structure, creating a distinct 'recipe' without requiring any background information on the model itself.

What types of AI models does CLIKA's ACE support?

We support all types of AI models (even custom, fine-tuned models). The current limitation is only the size of model - under 15B parameters. We will be supporting larger model sizes soon.

Would it work on my custom model?

Yes, our compression engine works on any AI model, as long as it's composed of the layers that we support, please refer to our docs page for the full list of supported layers.

What if I can't share my model or data?

No problem. Our ACE SDK works in on-premise or air-gapped environments--everything stays on your computers. We can't see your private model or your data.

What types of hardware does CLIKA's ACE support?

Currently we support, Nvidia (TRT, TRT-LLM), Intel & AMD GPUs and CPUs (OpenVINO), Qualcomm (coming soon - QNN, Genie).CLIKA can support any hardware, as long as the target's inference framework supports the ONNX format.To ensure broad hardware compatibility, CLIKA continually reviews and updates its support for various inference frameworks by:

1. Analyzing the limitations and constraints of each framework on the target hardware—such as supported layers, operations, and reduced bitwidth precisions (e.g., 8-bit, 4-bit), and
‍
2. Automatically converting unsupported elements into optimized, supported alternatives.This enables CLIKA to output highly compressed ONNX models that fully leverage the hardware’s acceleration capabilities.

What is the output of the CLIKA compression pipeline?

Any imported model to CLIKA ACE is 1) automatically compressed, 2) compiled to target HW format, resulting in 3) faster inference speed while 4) minimizing accuracy loss. Depending on the imported model type and target HW type, the output performance can vary in terms of model size reduction and speeed acceleration.

How can CLIKA preserve performance after compression?

CLIKA's compression engine calculates the "compressibility" of each component of the model based on the model architecture, statistically inferring how much its model performance will change as a result of different optimizations. This analysis allows the automation engine to intelligently apply the maximum possible compression to each part of the model safely. But for the user, the complicated details of this process are automatically handled. Doing so bypasses the extremely time-consuming (often 6+ months) process of manual model optimization and puts deployment-ready models into your hands in minutes.

What types of techniques does CLIKA compression include?

In addition to quantization and pruning, Clika's compression engine also employs techniques such as:
- Layer Fusion (Horizontal/Vertical and Memory)
- Layer Replacement (substituting multiple layers with a single one when possible)
- Layer Simplification (reducing symbolic shapes and arithmetic complexity)
- Redundancy Removal (eliminating duplicate or unnecessary computations)