OA 我是周五完成的 。第二周周一下午收到邮件说进入面试流程,让我给出了我两周之后的空余时间。回信之后,过了两天就收到邮件确认了面试时间。两轮背对背面试,每轮一个小时。给了 Livecode 链接,但是没有用到。
面试经过 #
面我的是两个印度 L6 senior applied scientsts,貌似在一个部门,但貌似在不同的组。因此我感觉面试官并不是在直接招人,而是同部门内一起协调面试?具体我不清楚。
第一个面试官前半部分主要是亚马逊经典的行为面试,印象中他问了:
Tell me a time when you have to face calculated risk and have to make a decision where speed is important. What the risk was, and what was the outcome of your decision.
Tell me a time when you had to deliever a result under a tight deadline. What did you di and what was the result?
行为面试部分我们聊了大概有 25-30 分钟,聊了很多。
之后就是简历深挖。问了之前的实习项目细节,问得非常细。然后让我介绍我的科研发表。
没有问 ML breadth & depth。
第二位面试官稍微有点口音,加上声音不是很清晰,他说的第一句话我让他重复了三遍才理解。不过后来我就适应了口音。第一部分依旧是行为面试:
Tell me a time when you have to learn things at a deeper level. What did you do, why you did those, and what were the outcomes.
Tell me a time when you do things outside of responsibilities. What did you do, why, and what were the outcomes.
我其实也不知道 STAR 模型具体是什么,也没有按照这个要求。我也没有去想和哪一条 Principle 相关,就是大概想一想哪个例子能让我有更多可说的。
这里我想提一下,面试官在听的时候一直在做笔记。我不知道这个目的是什么。我感觉这些笔记会在最终决策时用到。所以两次面试尽量不要讲重复的例子。
行为面试之后,非常潦草地问了我的科研,然后立马 ML breadth,问了很多问题,主要三个部分:基础机器学习、神经网络、LLM。
- What are the sources of model baises in machine learning and how to matigate baises?
- What chanllenges are there when dealing with feature engineering (feature selection, one-hot encoding)?
- Explain RNN.
- Explain Cross-entropy loss. What it is and why do we use it rather than least squares.
- Explain gradient descent in neural networks.
- What are transformers.
- What is prompt engineering.
- What are some prompt engineering techniques (Chain of thought, ReAct, etc)
- What is chain of thought.
- What is the difference between prompt engineering and fine tuning.
- What is knowledge distilation.
- How to infer causality in data.
- Do you have experiences dealing with time-series data?
我没想到会问和我的科研无关的内容。
我是这么理解的:
第一位面试官自己读过博士,所以除了必须的行为面试,他主要的关注点是科研经历,会挖得很细。第二位面试官没有博士学位,工作经历比较丰富,所以他不是很在乎我的科研深度,他主要关注我的实战能力,以及我的知识广度。这次面试我得到的经验是,对于自己科研之外但有一些相关的,比如 LLM 和监督学习,最基本的概念也要了解一些,但不用很深入。
两轮面试都没有写代码。这个我觉得主要是看面试官想考察什么角度。行为面试完、简历简单问问之后,可能剩下 20-30 分钟,面试官如果看中代码能力,可能会给一个 easy-medium 的题目,但我感觉因为时间有限,不会给比较难的。
我如何准备的 #
首先我看了一亩三分地上一些 applied scientists 的面试贴,知道了面试主要有:亚马逊领导力原则行为面试、简历深挖、ML breadth、ML depth、代码。
代码就是做了几道我之前做过的 Leetcode 题目 ,挑简单的做。
我把官网上 Leardership priciples 的视频 都看了一遍,每一条大概想了两三个关键词,提醒我用什么例子来回答。
简历深挖,我首先是把我最重要的两个项目尝试用简单的语言解释了一下,确保我可以很快让面试官了解核心内容。我看到第二位面试官的实践经验有很多大数据的内容,第一位面试官有很多优化的经历,我就把两门相关的课程笔记重新回顾了一下,以防他们问。
ML 知识方面,我从一亩三分地的帖子上摘录了很多:
ML basics/breadth:
What is a random forest? Difference between it and gradient boosted forest?
What is a decision tree?
Difference between CNN and RNN?
Difference between word2vec and doc2vec?
What is KMeans
What is AUC-ROC
What is overfitting and L1 and L2 regularization?
Explain L1/L2 norm.
Difference between encoder-decoder and decoder-only llm. Why use decoder only LLM? Challenges with decoder only LLMs?
Explain PCA in detail.
What is the difference between RNN and LSTM?
Why does L1 give parse parameter, decision boundaries in logistic regression?
Explain bias and variance. Which has larger variance or bias, neural networks or logistic regression?
How do you control bias and variance trade-off?
Difference between self-supervised learning and unsupervised learning?
Can self-supervised learning fall in the unsupervised learning category?
What is the difference between MLE (Maximum likelihood estimation) and MAP (maximum A Posteriori)
What is the assumption of the hidden Markov model?
How to deal with unbalanced data?
How to deal with overfitting? (early stop, regularization, dropout)
What is data augmentation?
What is the difference between generative and discriminative? (Generators care about joint distribution whereas discriminators care about conditional distribution.)
What is the activation function?
What is gradient explosion and vanishing? How to deal with gradient vanishing?
How to encode input in a classification problem?
Explain how VAE (Variational Autoencoder) works.
Explain how CNN works.
What is the difference between the latent space learned by auto-encoder and PCA? (PCA is linear combinations of components whereas auto-encoder can be non-linear).
Explain what is dropout.
Explain softmax and sigmoid and their differences.
我只是大概看了看,不懂的问了问 ChatGPT,但总体来说我准备得不充分,所以这部分在第二个面试时回答得不太好,这些知识里关于 RNN 我不懂但是没有细看,面试的时候真被问到了。我说我不知道。我当时的想法是,因为我的科研经历主要是无监督学习,面试官应该不会问关于神经网络和大语言模型,但没想到还真问了很多。
总结 #
总体来说我觉得我的准备策略是可以的。我看到代码要求不高,所以没有着重复习。深挖简历很有帮助,因为毕竟是自己的经历,面试官不管问哪段经历或者哪个项目,都需要做到可以把细节讲清楚,让对方可以快速理解。
我做的不足的一点是没有太认真准备 ML breadth。这一点挺遗憾的。不过也算是让面试官看到了我最真实的水平。
总体来说,亚马逊的面试和谷歌、Meta 的不一样。因为行为面试占了至少 1/3 的比重,而且面试很多都是对话,所以对于被面试者的英语口语和听力要求比较高。首先要口语流利,然后要对口音的容忍度比较高,不然双方能听懂都是一个问题。这部分就没法短期突击了。
#工作最后一次修改于 2025-03-30