Lowrey Organ Models by Year

Heretic: Fully automatic censorship removal for language models

Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Heretic: Fully automatic censorship removal for language models

Trending now