5.7.3 使用PyTorch实现CW

智能系统与技术丛书·AI安全之对抗样本入门作者：兜哥投票推荐加入书签留言反馈

    5.7.3 使用pytorch实现cw
    下面介绍在pytorch平台实现cw算法的基本过程，示例代码位于：
    https://github.com/duoergun0729/adversarial_examples/blob/master/code/
    5-cw-pytorch.ipynb
    首先定义全局参数，具体参数的含义可以参考5.7.2节中tensorflow的实现细节，其中需要指出的是pytorch中基于imagenet2012训练的alexnet模型，类别数为1000，并且图像预处理和的区间为–3.0到3.0，这两点与tensorflow有所差别。
    #像素值区间
    boxmin = -3.0
    boxmax = 3.0
    #类别数pytorch的实现里面是1000
    num_labels=1000
    #攻击目标标签必须使用独热编码
    target_label=288
    tlab=variable(torch.from_numpy(np.eye(num_labels)[target_label]).
    to(device).float())
    进行迭代及二分查找，定义需要训练的变量modifier，以及adam优化器。
    for outer_step in range(binary_search_steps):
    print("o_bestl2={} confidence={}".format(o_bestl2,confidence) )
    #把原始图像转换成图像数据和扰动的形态
    timg = variable(torch.from_numpy(np.arctanh((img - boxplus) / boxmul *
    0.999999)).to(device).float())
    modifier=variable(torch.zeros_like(timg).to(device).float())
    #图像数据的扰动量梯度可以获取
    modifier.requires_grad = true
    #定义优化器，仅优化modifier
    optimizer = torch.optim.adam([modifier],lr=learning_rate)
    根据modifier和原始图像定义新的输入newimg，并进行前向计算或者输出当前的模型。
    for iteration in range(1,max_iterations+1):
    optimizer.zero_grad()
    #定义新输入
    newimg = torch.tanh(modifier + timg) * boxmul + boxplus
    output=model(newimg)
    定义损失函数，其中loss2直接使用torch.dist计算l2距离，通过torch.clamp计算loss1和0之间的最大值。
    loss2=torch.dist(newimg,(torch.tanh(timg) * boxmul + boxplus),p=2)
    real=torch.max(output*tlab)
    other=torch.max((1-tlab)*output)
    loss1=other-real+k
    loss1=torch.clamp(loss1,min=0)
    loss1=confidence*loss1
    loss=loss1+loss2
    通过loss反向传递并优化变量modifier。
    loss.backward(retain_graph=true)
    optimizer.step()
    使用adam迭代优化如果对抗样本的预测值与定向攻击目标的标签一致，表明定向攻击成功，更新o_bestl2、o_bestscore和o_bestattack。
    l2=loss2
    sc=output.data.cpu().numpy()
    # print out the losses every 10%
    if iteration%(max_iterations//10) == 0:
    print("iteration={} loss={} loss1={} loss2={}".
    format(iteration,loss,loss1,loss2))
    if (l2 < o_bestl2) and (np.argmax(sc) == target_label ):
    print("attack success l2={} target_label={}".format(l2,target_label))
    o_bestl2 = l2
    o_bestscore = np.argmax(sc)
    o_bestattack = newimg.data.cpu().numpy()
    如图5-32所示，经过10轮二分查找，每轮adam优化1000次，攻击成功，c值为0.1953125，l0为90441即只修改了90441个像素，l2为69825.1，即对抗样本与原始图像之间的差别。
    l0=90441 l2=69825.06902252228
    图5-32 原始数据和对抗样本的对比示意图（adam迭代1000次）