I wish I could prove that reusing samples for gradient estimation in optimization is a bad idea but idk where I'd even start. it's never worked for me.


authors always try to justify it by showing they can preserve the PDF but never touch on how it affects the iterates of the algorithm it's being fed into

