Implement adaptive stretch parameters

## Description
Research and implement adaptive stretch parameters that can be learned during training.

## Motivation
The stretch parameters (gamma, zeta) control the hardness of the concrete distribution. Learning them could improve convergence and final sparsity quality.

## Research Questions
1. Should stretch be global, per-layer, or per-weight?
2. How to constrain stretch to maintain valid distributions?
3. Does adaptive stretch improve final accuracy/sparsity?

## Proposed Implementation
```python
class AdaptiveHardConcrete(nn.Module):
    def __init__(self, *gate_size, init_stretch=0.1, adaptive=True):
        super().__init__()
        ...
        if adaptive:
            # Learn stretch parameter
            self.stretch = nn.Parameter(torch.tensor(init_stretch))
        else:
            self.register_buffer('stretch', torch.tensor(init_stretch))
    
    @property
    def gamma(self):
        return -self.stretch.clamp(0.01, 0.5)  # Constrain range
    
    @property  
    def zeta(self):
        return 1.0 + self.stretch.clamp(0.01, 0.5)
```

## Experiments
- Compare fixed vs adaptive stretch
- Ablation on stretch initialization
- Interaction with temperature scheduling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement adaptive stretch parameters #18

Description

Motivation

Research Questions

Proposed Implementation

Experiments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement adaptive stretch parameters #18

Description

Description

Motivation

Research Questions

Proposed Implementation

Experiments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions