Reinforcement Learning from Human Feedback (Alignment and post-training of LLMs)