February 3rd, 2016

Word, the Web, and the Dangers of Pasting HTML

word the web and the dangers of copy and pasting html

Why won’t copy/paste work with my CMS?

If you’ve spent any amount of time working with a CMS, you’ll probably have encountered some form of this problem:

  1. You have text from a Word document or somewhere on the Web that you want to publish on your website.
  2. You copy and paste that text into your CMS.
  3. All hell breaks loose.

The formatting on your document has changed, strange symbols have appeared without explanation, and even your efforts to edit the document are foiled. What’s going on?

You’ve probably encountered variations of this elsewhere if you’ve done any amount of copy/pasting to or from places on the Internet. For instance, if you’ve ever tried to copy a block of text off a webpage into a word document, you may have noticed that you’ve copied formatting along with the text. The same will probably have happened to you if you copy/paste that text into an email. The question is: why?

Because you’re not just pasting text, you’re pasting HTML.

What does it mean to paste HTML?

As an example, I’ve copy/pasted an article from my Facebook feed into our CMS. Here’s what it looks like on Facebook:

Screen Shot 2016-02-02 at 2.06.48 PM

And here’s what it looks like pasted into the visual view of our CMS:

Screen Shot 2016-02-02 at 2.07.08 PM

And here’s what it looks like once I toggle to the plain text view:

Screen Shot 2016-02-02 at 2.07.26 PM

Oh man.

This is a problem for you for several reasons:

  1. Your site already has a stylesheet, and the HTML you’ve pasted from that other location could conflict with your site’s stylesheet, resulting in a big, ugly mess on your front end.
  2. You don’t know what you’re copying over. Mostly you’ll probably just have a lot of needless garbage cluttering up the back end, but you could also copy over some unwanted links.
  3. If you copy and paste an image from another website, and then the original website removes that image, you will lose that image as well. Instead, save that image to your computer, then upload it to your media library.

How do I avoid copying and pasting unwanted HTML?

First, let me back up. What was I talking about when I mentioned toggling between the “visual” and “text” views of our CMS?

Any decent CMS (here I’m demonstrating with WordPress) will allow you to see both the text as you expect it to appear on your website, and the HTML that governs the formatting behind it. This gives you the option of (for instance) being able to view bolded text in bold, or with the HTML markers in place. Once again, here’s the opening paragraph of this blog, as it appears in the backend of our CMS:

Screen Shot 2016-02-02 at 2.22.31 PM

See how in the upper right-hand corner there’s a little tab that says “visual”? In this view, you can see the words I’ve typed, and the ordered list that follows. Here’s how it looks from the text view:

Screen Shot 2016-02-02 at 2.22.42 PM

Now, instead of the neatly-formatted text from the visual view we see the HTML that is describing how that text should appear. This will show you any HTML you might have accidentally copied over from another source, but it also acts as a plain text editor: any text you paste here will be automatically stripped of any HTML and be rendered strictly as text.

If you do this, you will have to re-format your text to include bold, italics, bullet points, and the like, but it will save you a lot of trouble in the long run.

Any decent CMS will have a similar option to this, so if you’re not using WordPress, look around for it on whatever you’re using. However, if you can’t find one, you can achieve the same results by copy/pasting into any simple text editor, such as Microsoft’s Notepad or Notes on a Mac.

Any questions?

If anything above seems unclear, or if it hasn’t resolved your problem, feel free to reach out! We’re always happy to lend our insight.

Related Articles