Too much going on at #acl2022nlp for live-tweeting, but I’ll do a wee thread on 3 papers I found thought-provoking: one on robustness probing by @jmderiu et al.; one on underclaiming by @sleepinyourhat; and one on bots for psychotherapy by Das et al..
Deriu et al. stress-test automated metrics for evaluating conversational dialogue systems. They use Blenderbot to identify local maxima in trained metrics and so identify blatantly nonsensical response types that reliably lead to high scores https://aclanthology.org/2022.acl-short.85/
As they write, "there are no known remedies to this problem". My conjecture (also see Goodhart's law): any automated metric will be affected by this as long as we're training on form alone. It's a thought-provoking paper, go read it
Next! Bowman https://aclanthology.org/2022.acl-long.516 acknowledges the harms of hype but focuses on the inverse: overclaiming the scope of work on limitations (='underclaiming'). I think his argument underestimates the enormous asymmetry of these cases and therefore may overclaim the harms?
I did wonder whether @sleepinyourhat is playing 4D chess here by writing a paper that's likely to attract citations from work that may have an incentive to overclaim the harms of underclaiming 🤯😂 #acl2022nlp
Third is Das et. al https://aclanthology.org/2022.bionlp-1.27 who propose to expose psychologically vulnerable people to conversational bots trained on Reddit, which frankly is every bit as bad an idea as it sounds (the words "ethics" and "risk" do not occur in the paper 🤷) #acl2022nlp #bionlp
There’s been loads more interesting and intriguing work at #acl2022nlp and I have particularly enjoyed the many talks in the theme track sessions on linguistic diversity. Check out the hundreds of papers (8831 pages) in the @aclanthology here: https://aclanthology.org/events/acl-2022
Okay because @KLM has decided to cancel my flight and delay the next one, some quick notes from the liminality of Dublin Airport on a few more #acl2022nlp papers I found interesting, revealing, or thought-provoking
Ung et al. (Facebook AI Research) train chatbots to say sorry in nicer ways, though without addressing the underlying problems that make them say offensive things in the 1st place. I thought this was both interesting and revealing of FBs priorities. Paper:https://aclanthology.org/2022.acl-long.447
Room for improvement: throughout, Ung et al remove "stop words" — but as conversation analysts can tell you, turn prefaces like uh, um, well, etc. often signal interactionally delicate matters, i.e. precisely the stuff they're hoping to track here 😬
Further, feedback is seen as strictly individual — whereas in normal human interaction it (also) reinforces *social* norms. Consider: those offended may not always have the social capital, privilege or energy to speak out ➡️ FBs bots will blithely continue to offend them 🤷