Hành động

Chương trình Nghiên cứu AI Safety Mùa Thu 2025

(Cơ hội này sẽ hoàn toàn trong tiếng Anh, do bản chất của chương trình nghiên cứu cần đọc tài liệu tiếng Anh chuyên ngành. Tuy nhiên, bạn vẫn có thể giao tiếp với điều phối viên HAISN bằng tiếng Việt nếu muốn).

Applications is now open - apply here.

-

Antoan.AI is partnering with Hanoi AI Safety Network on the Fall Research Fellowship (FRF) to offer undergraduate/graduate students and early-career/mid-career professionals the opportunity to engage in AI safety research and join a community of people working on AI safety. Previous fellows have published their research projects at ICLR, ICML, and AAAI.

FRF projects span safety evaluations, interpretability, AI control, adversarial robustness, and more. You will have the chance to express your interests and preferred project further into the applications.

The FRF will run for 1-3 months starting August, depending on the nature of your project and participants’ availabilities. The FRF can be participated part-time and remotely. Funding is available for GPU compute and AI subscriptions.

You might be a good fit if:

  • You are interested in making advanced AI systems safe and trustworthy.
  • You have a technical background (e.g. ML, CS, cybersecurity, maths, physics, neuroscience, etc).
  • You are curiosity-driven and willing to pursue your own ideas.

Plus points (although not a requirement)

  • You have or plan to pursue a career in research.
  • Deep understanding of AI safety arguments and landscape.

The application process includes filling out a short form and a chat with one of our organisers. We will review applications on a rolling basis, and the deadline to apply is Friday August 8th 23:59 AOE. For more information, fill out this form.

Example AI safety projects below. Feel free to express interest in any of these, or propose your own!

  • Current AI systems often know when they are being evaluated. This is concerning, as it undermines the reliability of safety evaluations and allows for strategic deception, such as sandbagging (intentionally doing badly on evaluations). Blackbox approaches, i.e. simply asking the models when they are aware, might introduce unwanted signals, or might not be effective if models are sandbagging. Prior research points to interpretability tools as a promising avenue. Can we create model organisms of evaluation awareness sandbaggers through finetuning, then demonstrate that whitebox approaches such as probing or noising outperform purely blackbox?