2

Is there a software/script or any way I can replace all the her in a document with him and his wherever which one's applicable respectively automatically?

example

Calls her and tells her that her car is …

to

Calls him and tells him that his car is …

1 Answers1

7

The short answer: Yes, but it's harder than you think.

The long answer: Ordinary find-and-replace code operates on a morphological level, that is, by looking at the form of a text rather than by understanding its meaning. But there is no morphological indication to differentiate the objective and possessive cases of the third-person female pronoun, so ordinary find-and-replace can't tell the difference between them. In order to do so, you need a tool which can analyze your text at a lexical level -- that is, one which can examine the text and derive its meaning.

This is a much harder problem than simple find-and-replace, unless your problem domain is strictly enough constrained that you can hack together a few heuristics and then manually inspect and patch up the result. If you can get away with that, great!

If not, and it's a problem worth the effort of writing code to do the job properly, then you'll do well to start with the Stanford NLP Project's software repository, specifically CoreNLP, which includes an excellent part-of-speech tagger -- the exact tool you need to perform the lexical analysis I described.

To produce an example of what you get from CoreNLP, I fed the CoreNLP online demo the following sentence, based on your examples:

He calls her and tells her that her car is ready for pickup.

which it tokenized thusly:

Id Word   Lemma  Char begin Char end POS  NER Normalized NER Speaker 
—— —————— —————— —————————— ———————— ———— ——— —————————————— ——————— 
1  He     he     0          2        PRP  O                  PER0    
2  calls  call   3          8        VBZ  O                  PER0    
3  her    she    9          12       PRP  O                  PER0    
4  and    and    13         16       CC   O                  PER0    
5  tells  tell   17         22       VBZ  O                  PER0    
6  her    she    23         26       PRP$ O                  PER0    
7  that   that   27         31       DT   O                  PER0    
8  her    she    32         35       PRP$ O                  PER0    
9  car    car    36         39       NN   O                  PER0    
10 is     be     40         42       VBZ  O                  PER0    
11 ready  ready  43         48       JJ   O                  PER0    
12 for    for    49         52       IN   O                  PER0    
13 pickup pickup 53         59       NN   O                  PER0    
14 .      .      59         60       .    O                  PER0    

With reference to a list of the de facto standard part-of-speech tags, we find that CoreNLP has correctly identified the case of each pronoun in which we're interested -- PRP for personal pronouns, PRP$ for possessive pronouns.

Armed with this information, and the knowledge of the opposite-gender equivalents of each pronoun case, we could perform our replacements; in fact, since CoreNLP tells us character positions as well as parts of speech, instead of a find-and-replace semantic we could actually walk the parse tree and reconstruct the sentence word-by-word, filling whitespace between words and replacing pronouns of interest as we encounter them.

And that's how you can do it! Obviously, this is more or less the lightest possible treatment of such a complex subject -- but, if you're inclined to write the necessary code, this should be enough to get you into the starting blocks. Good luck!

Aaron Miller
  • 9,892