0

I recently converted a PDF to Microsoft Word, and I'm facing a problem after changing the margin and paper size of the converted file. There is a paragraph mark in the middle of the dialogue. Here is the example:

"When Fillmore was dying, he was super hungry. But his doctor was trying to starve his fever or whatever.
Fillmore wouldn't shut up about wanting to eat, though, so finally the doctor gave him a tiny teaspoon of soup.
And all sarcastic, Fillmore said, 'The nourishment is palatable,' and then died. No truce."

This dialogue was suppose to be in the same paragraph, and there are many same problems in the converted article. How do I fix it? (I hope I could fix it all at once, which is a time saving.)

Wes Sayeed
  • 13,854

2 Answers2

0

From what I understand about the above problem, it is related to OCR processing, and OCR just converts what it understands and creates a document, many times making wrong translations related to the characters that can be misinterpreted. So, anyway, this is not an EXACT science yet, as the OCR programs are evoluted so well ,but it is not still 100% positive about their interpretation. We are always being in need to fix those errors, manually.

MisterVSE
  • 140
  • 1
    If I understand the problem, it doesn't involve OCR, just converting from a PDF to Word. – fixer1234 Sep 25 '15 at 17:32
  • You get the same issue in many cases copying from a PDF and pasting somewhere else. It depends on how the PDF was produced. – DavidPostill Sep 25 '15 at 19:21
  • Yes, I was wrong. The OCR are involved if you copy from a printed sheet and scan it. If a PDF were created from scanning, OCR will make that mess when creating the text from scanned data. – MisterVSE Sep 27 '15 at 12:46
0

I don't think the paragraph structures of a PDF document are the same as in any word processing program. It's just the positioning of the text in the sheet and the characters are either represented as a printed image or an associated character, but I don't believe it carries out the formatting information like Color, Paragraphs and Next line or Enter code.

Jamal
  • 455
MisterVSE
  • 140
  • Thanks for your continued effort to help with this question. SU is a bit different from a forum, where people contribute any information they think will be helpful. The site is a knowledgebase, and answers are intended to be definitive. Anything you aren't sure of, you should research before posting as an answer (and the best answers include citations from an authoritative source). – fixer1234 Sep 27 '15 at 18:44
  • I think what I wrote is helpful, that's why I posted here. If anyone will have a better answer they'll contradict my statement, perhaps and answer here. So I thought, as there is no any other answers, my understanding of the subject is not so out of context. – MisterVSE Sep 28 '15 at 17:06
  • You deserve credit for being the only one, so far, to respond. You're thinking in terms of a forum model, though, where you post what you think is right and if it isn't, somebody else will post a different answer. Think of Super User more like a textbook. You might be the only author. But whatever goes in should be authoritative information. If you aren't already an authority on the subject, research it before you post rather than posting what you think it is. BTW, answers that cite an authoritative source tend to get a lot more upvotes, and wrong answers without citation attract downvotes. – fixer1234 Sep 28 '15 at 17:27
  • Thank you for the comment. I wasn't thinking about the votes, I just wanted to add a comment that would help the user to (in this case, not solve, because from my experience, there is no easy way from PDF to WORD) get along, and don't run desperately for a quimera that would solve their problem. I confess I've added a wrong comment because at first I didn't understand the problem in detail (so OCR has nothing to do indeed) – MisterVSE Sep 29 '15 at 20:23