I can understand compression for L2, but not L1.
L1 is all about speed - which includes IOPS, especially for random small blocks, since that's how programs generally access data (i.e. lots of little bites, even if sequential). Unless you're just caching text files which compress well, you might only get a 2x improvement at best. Worst case for video and audio, you'd get no space improvement (incompressible), and only slow it down. Adding more code in that path, more work, and vastly more complex cache management doesn't make much sense - the ratio of improvement from more space vs the slowdown from CPU load & latency would be a wash - or worse.
REFERENCE: PCIe 4.0 (i.e. NVMe) speed throughput is 8GB/sec. DDR4 is ~20-25GB/s. So the ratio is ~3:1 on current tech - meaning you can't add much code to the path before you negatively impact the overall application performance.
L2 however has a much higher ratio between device speeds. So a little bit of CPU time is more worth it to speed up very slow access (depending on hit ratios, i.e. cache to HDD size ratio). This was another Stacker advantage (and I owned one of their early ISA cards!): even using a slow CPU (80286), the read rate from the HDD for 2 sectors was sooo much longer than the compute time to read a single 2x compressed sector and decompress it, that it netted an overall speed improvement - AND doubled your drive's capacity.
Of course the numbers are different than in the 286 days, but I imagine the ratios aren't that much different. So I would think this might be an interesting advantage for L2 (especially if Romex increases the max cache size).
REFERENCE: HDD throughput is ~200MB/s, so roughly 40 times slower than PCIe 4.0.
Latency is even higher: NVMe = ~100 MICRO seconds at 50k IOPS, while HDD's are ~10 *MILLI*seconds (~100x slower).
IOPS is even more skewed: typical SATA 7,200rpm HDD's do less than 100 IOPS; the Samsung 980 Pro PCIe 4.0 NVMe claims 1,000,000 - or about 1000x higher.
So in the case of large HDD to L2 size ratios, I would think the higher cache hit rate and time saved by having 2x the data in cache would more than pay the penalty for compression - but I'd still want it as an option to decide based on my workload, HDD and L2 size, etc.
That said, I get that managing variable-length blocks due to varying compression rates is a nightmare! (but sounds like a fun problem to at least model, if not solve