Ep. 068 – Finding the Needle in the Haystack: Identification through Writing Style

Did you know that the unique little errors and style you have in writing can pinpoint you like a fingerprint, even in your anonymous online posts?  Join us as we discuss this exciting topic with Sadia Afroz. April 13, 2015

Contents

Download

Ep. 068 – Finding the Needle in the Haystack: Identification through Writing Style

Miro Video Player

Get Involved

The SECTF Webinar in still in the RESOURCES PAGE

Got a great idea for an upcoming podcast? Send an email to contribute -@- social-engineer.org

Enjoy the Outtro Music? Thanks to Clutch for allowing us to use Gone Cold, one of the best songs they have ever written, as our new SEPodcast Theme Music

If you want to see something really unique – check out our buddies at Exploitable Labs

And check out a schedule for all our training at Social-Engineer.Com

Download View in iTunes

Show Notes

Did you know that the unique little errors and style you have in writing can pinpoint you like a fingerprint, even in your anonymous online posts?  

Join us as we discuss this exciting topic with Sadia Afroz. 

  • Can you discuss the different methods to detect these deceptions by analyzing writing style?
    • Is it possible to detect multiple identities from a single email?
    • Would it be possible to do this with just a single email or would you need multiple?

 

  • I know you have done a lot of work in stylometry.   Your publication “Doppelgänger Finder: Taking Stylometry To The Underground” was quite popular amongst InfoSec news publications.
    • Can you briefly introduce the concept of stylometry and discuss how stylometry applies to information security?
  • So, analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld.
    • Can you talk about some of the most prevalent findings here?
    • Does this apply specifically to web presence or can this also apply to  phishing campaigns?

 

  • In your article “Use Fewer Instances of the Letter “i”: Toward Writing Style Anonymization” Your framework defines the steps necessary to anonymize documents. Can you explain how this works?
    • What about for phishing emails where there are multiple authors? Is this something you are able to determine?
  • Can you speak about how you’ve found security and privacy to intersect with machine learning and the implications this has for the human element?

 

Also check out her BIO PAGE

The freeware writing recognition tool

AND

Her favorite book

 

Comments

  1. AC says

    I have some suprising news, The link to her favorite book is even CENSORED in Iran 😐 (I’m from Iran).

  2. says

    Thank you for another fantastic post. Where else may just anybody get that type of
    info in such an ideal approach of writing? I have a
    presentation next week, and I am on the search for such info.

  3. Paul says

    Greatly exaggerated for media consumption as the described methods are not as accurate as described there are certainly more into it what she describes is isolated experiments on controlled environments with limited variations in topics.

    Her paper on doppleganger something is also not evaluated by using ground truth information; her only criteria is based on how well the clustering algorithm separates them which practically doesn’t mean anything so basically she has no idea for sure if the ones that are similar enough are actually the same person.

    Phishing emails remark is exaggerated as it only applies to bulk phishing email attacks it certainly does not apply to “spear phishing” email attacks.

    I am really disappointed that marketing is poisoning academia.

  4. Sadia says

    Paul,
    >Her paper on doppleganger something is also not evaluated by using ground truth information

    I’d encourage you to read the paper.
    The method evaluated on a blog dataset with ground truth where every author had two blogs.
    In the underground forum case, we didn’t have any solid ground truth available as those were leaked dataset from the internet, which is why that section of the paper is called “a case study.” However in some cases the messages of the users can provide some indications like when the users themselves mention their other accounts.

    >I am really disappointed that marketing is poisoning academia.

    Academic articles are peer-reviewed. So I’d encourage you to join as a peer and review the papers.

Trackbacks

Leave A Reply