I feel like with the rise of AI something that anonymizes writing styles should exist. For example it could look for differences in American versus British spelling like color versus colour or contextual things like soccer versus football and make edits accordingly. ChatGPT could be fed a prompt that says “Rewrite the following paragraphs as if they were written by an Australian” but I don’t know if it would have a good enough grasp on the objective or if it would start shoehorning in references to koalas and fairy floss.

I tried searching online to see if something like this existed and found a few articles from around the 2010s such as Software Helps Identify Anonymous Writers or Helps Them Stay That Way by the New York Times. It talks about stylometry and Anonymouth but it seems like Anonymouth hasn’t been updated in years. All recent articles seem to be about plagiarism and AI.

For context what got me thinking about the topic was remembering JK Rowling being revealed to be the author of a mystery novel called The Cuckoo’s Calling. Smithsonian wrote an article about it called How Did Computers Uncover J.K. Rowling’s Pseudonym?. I thought it could make for a neat post here.

  • utopiah@lemmy.ml
    cake
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    10 months ago

    I wouldn’t just trust random Lemmy users (no offense) but instead check for actual fields, e.g stylometry or writeprint, and from there check the state of the art. Not being an expert would make that tricky so I would take a recent published papers, e.g https://arxiv.org/abs/2203.11849 to understand the challenge. As is always the case they’ll review the field, e.g section 2 here, and clarify the 2 sides of the arm race, here Obfuscation/Deobfuscation. The former in 3.2 mentions examples of techniques authors estimate to be good starting point, e.g writeprintsRFC. I’d then search for such tools if they don’t directly provide link to open-source repository, e.g theirs https://github.com/reginazhai/Authorship-Deobfuscation . I would then try a recent one that I can easily setup, e.g via Docker, and give it a go. I would then read the rest of the paper, see who cites it, and try to get a more up to date version.

    TL;DR: I don’t know but there is dedicated research which result I’d trust more than the opinion of strangers who are probably not expert.