//So full disclosure, from time to time I write patient education content for orphan and rare diseases, complex and simple cancers, and a genomic testing center for an academic medical center. Though I am a previous employee of the Penn’s Annenberg School of Communications, the views reflected throughout are entirely my own. I had no involvement in Tim Libert’s research, nor any subsequent reporting save for this post. I guess I should also note that I admittedly like saving money at Target once in a while too.

In this post I’ll explore some of the lesser known ramifications of big data and search optimization in the health care vertical – from the checkout line at Target all the way to nationally recognized health care organizations’ marketing efforts. Then I’ll layout a few ways users and content marketers can rethink their online behavior in pursuit of a more peaceful coexistence.

But first:

A Cautionary Tale

What follows is my take on a cautionary tale told around the campfire at every digital marketing conference that took place circa 2012-2013.

We’ve all been there – that moment at the checkout line when you scan your membership card or loyalty QR code for an extra percentage off a few select items. Not only do you save a few dollars here and there this trip – but you also walk away with a seemingly endless stream of coupons for things you didn’t realize you needed:

  • Maybe it’s late winter and there’s a coupon for sunscreen?
  • Maybe it’s late summer and there”s a coupon for snow boots?
Sammy Davis Jr

Man oh Manischewitz, how did they know I’d be in need of new printer cartridges?

Point being you walk away from the register and do one of two things – either you crumble up the three trees worth of coupons you just got and throw it away OR you carefully pick out one or two, slap them up on the fridge and hope you remember them next time you make a Target / CVS / ACME whatever run.

So far so good right? No major violations of your privacy on the surface – just a helpful stream of coupons you may need soon.

Target (and I should stress as well as many other major national retailers, coffee shops, and wholesale clubs) rewards shoppers with a percentage off their bill at checkout if they scan a barcode card tied to their name, address and phone number. This is done to keep track of what customers of different demographic profiles are likely to purchase.

On the surface that does seem fair – at such a large scale it makes sense to keep track of what goods are sold where, at what frequency, time of day, week and year as well as to whom right? Well to be fair, there’s a little more going on here.

That’s exactly what the father of a Minneapolis teenager supposedly learned the hard way in 2012 when flipping through the plethora of coupons he got from Target.

The horror story as told in The New York Times, Forbes, and Slate goes something like this:

“My daughter got this in the mail!” [said an angry father to the manager at Target]. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”


The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.


On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.

Turns out Target was using a predictive big data model to try to guess what customers would need ahead of time:

As [Target statistician Andrew Pole’s] computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.


One Target employee [a New York Times reporter] spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August.the New York Times

We can learn a few things from this story:

  1. This case boiled down to a customer legally trying to keep information from her folks
  2. Fair or not, the onus here is on the teenager to get her own customer loyalty card, or not use one at all.
  3. When her family signed up for the cards in the first place, they agreed to a Terms of Service (ToS) agreement.

When users / customers agree to a ToS, they’re accepting a trade off: special benefits or perks in exchange for data points about their habits. They key here is that they’re supposedly made aware of that trade off. So what happens when the benefits are less than quid pro quo, and the ramifications for the data points collected go far beyond an awkward conversation at the dinner table?

Translational Research

The Target story above is just one example of the ways seemingly innocuous data points can paint a much larger, more telling picture. I think part of what gets to me the most is the fact that Target didn’t seem to make a distinction between shared users of the rewards card – in that any other member of the rest of the family was just as likely to get those coupons. 

Shared Devices, Shared Data, Shared Targeting

So what other shared devices do we use to collect data points? Basically all of them.

While the family computer may be a dying concept (RIP my old friend) the truth is all of many of our devices are now computerized – and as a result push out data points for a variety of quality control and use case analysis. Our cars, cellphones, refrigerators, thermostats, and so on all send information on performance and use back to someone. How and when we use these technologies – and what for – can have major implications on what content we’re served, what coupons we get from Target, and the likelihood of getting a desirable loan or an insurance rate.

I’m not going to get into the internet of things debate – let’s just say for now that Google, Facebook and Amazon are more in the business of managing the data raked in from all of the services they provide than the services themselves. Remember – we’re the product, not just the user.

A few notes on CRMs and “Customized Experience Content Marketing”

Have you ever been served an eerily accurate (or wildly inaccurate) set of ads on Facebook’s Right-Hand-Rail or in the newsfeed? That’s Big Data happening.


Facebook seems to think my country music tastes translate to tastes in ‘Lonely Cowgirls”

This is all done through a large database, often called a Customer Resource Management tool, or CRM. When you visit a webpage or complete a form (often in exchange for some free collateral, registration for an event or enrollment in a newsletter) – a customer profile filled with information about your visit is established.

Later on, when a marketer wants to target a specific type of user (think ‘likely pregnant’ in our earlier example), and you fall into that group based on what you’ve clicked on, you’ll wind up getting materials specifically for that group. Let me explain:

With each data point collected (age, sex, location, topics interested, time of the day searched, device, operating system, etc – basically anything you can require users to provide in addition to meta data generated) you’re grouped into a bucket and then served ad content ‘tailored just for you’ throughout your trips around the internet.

We’re told that we’re handing over our data points in exchange for a more custom tailored experience (think coupons specific to your shopping habits). The truth of the matter is a little more complex.

This same concept is what got the Minneapolis teenager grouped in with others deemed likely to be pregnant. Just imagine that instead of just her shopping history, a CRM recording her web search and browser history was used to predict her needs and habits.Figure-1-Market-Share-CRM

For the initiated, some of the more popular CRM tools brought to us by our favorite brands are the SalesForce Marketing Cloud, Microsoft’s Dynamics Marketplace, NetSuite’s SuiteApp.com, SAP’s Business ByDesign ecosystem, Sage CRM’s Partner Solution Source, SugarCRM’s SugarExchange and Zoho’s online Marketplace.

Cookies and Pixels and Tracking..oh my!


Little known fact: Cookie Monster regularly attends SMX East each year.

Not all tracking methods operate as behind the scenes as the CRM web forms mentioned above. Cookies and Pixels, the “small piece[s] of data sent from a website and stored in a user’s web browser while the user is browsing that website” are usually used to store data about a users preferences from visit to visit – this is how we’re able to stay logged in to our accounts on sites like Facebook, Twitter, Amazon and Ebay.

They tell the servers running a given website about your choices, actions, and most importantly that you’ve been there before.

What this ultimately means is that advertisers can use cookies and pixels to learn more about your activity online – not just what you’re looking at, but when you’re looking at it and how many times you’ve visited.

You know how you could swear that you’ve seen the same banner advertisement “follow you around the web”? Well truth be told it has.

Oversimplified, the process works like this:

  1. First you get a cookie for looking at that item on a page.
  2. That page works with a network of advertisers to display specific ads to owners of that cookie.
  3. This is done in the hopes that after repeat exposure, you’ll eventually decide to ‘convert’ (read as: download, purchase, subscribe, sign up, etc).

I’ll go into cookie and pixel protection at a later date – but suffice it to say that:

  • As a content consumer, it is very possible to balance privacy protection from these types of advertising and taking advantage of useful user preferences.
  • As a content producer, it is possible to respect the privacy of your user base.

Health Care and Big Data

So imagine for a moment that instead of buying extra lotion, and receiving coupons for diapers and baby powder, that Minneapolis teen was actually Joe Foobar, 56 years old, father of two, recently diagnosed with a late stage lung cancer.

Forgive my sudden morbidity – but this is actually pretty serious and unfortunately not far from the realm of possibility – as Penn Annenberg doctoral researcher Tim Libert explains in a recent case study, Your Privacy Online: Health Information at Serious Risk of Abuse:

There is a significant risk to your privacy whenever you visit a health-related web page. An analysis of over 80,000 such web pages shows that nine out of ten visits result in personal health information being leaked to third parties, including online advertisers and data brokers. – Tim Libert

So to recap, 9 out of 10 visits to a health-related web page (from a sample size of over 80,000) would result in some or all of your data being tracked through a combination of marketing techniques – both intentionally and otherwise. 

This includes but is in no way limited to:

  • Basic demographic information: age, sex, and to some degree location
  • Technical specs: operating system (OSX, Windows, Linux), mobile vs desktop, IP address, screen size
  • Content: titles and categories of pages you visited, the frequency and length you stayed, any predefined actions (read: GA Events) you completed, how you got to the page (search vs. direct vs. referral), which social network you used to visit the page (if applicable), etc.

All of that ‘meta data’ can actually paint a pretty detailed picture. And you handed all of that information over without even realizing it. In many cases – just for browsing.

So what does this mean?

Let me give you a scenario – and return to our imaginary friend Joe Foobar. He’s 56 years old, father of two, and was recently diagnosed with a late stage lung cancer:

After returning home from an appointment with an oncologist, Joe learns that after years of smoking he has been diagnosed with stage 3 lung cancer. Understandably, he wants to learn more about the disease, its symptoms, risk factors, treatment options and survival rates before talking to his family about it. After all, his doctor mentioned something about familial history and prevention.

Joe waits for his family to go to sleep and fires up the computer. He then visits a few websites that he remembers hearing about from television and radio commercials. He goes in with the understanding that they’d all be credible sources of information – helpful websites with the patients’ best interest in mind. After all – that’s how their messaging strives to portray them.

Digging around online that evening, Joe does the following:

  • Browses the pages about symptoms, risk factors, and outcomes data
  • Signs up for a few smoking cessation newsletters
  • Registers for a patient education seminar at a local hospital
  • Downloads a free guide to lung cancer treatment options

All of these sound rather harmless – until the next day Joe’s daughter logs on after school to do some homework and chat with her friends.

Now all of a sudden she’s getting advertisements for radiation therapy and thoracic oncology consultations – regardless of which sites she’s viewing. Joe’s daughter then gets worried and asks her brother and mother about it – all before Joe can come home from work and tell his family about it, the way he wanted to, when he was ready to do so.

I really do think that as people (users, webmasters and believe it or not advertisers) we can collectively work together to avoid that situation.

Blurred Lines: Educational vs Marketing Material

Part of what makes this whole thing so complicated is the recent trend known as “content marketing” “advertorials” or “sponsored content” that blends together the organic, explanatory, educational material (What is XYZ? What are it’s risk factors? What are my options for treatment?) and marketing material (Come to XYZ medical center for your treatment, exams, and follow up care).

It would be one thing if the product or service we’re talking about was simple – a screen printed t-shirt, tires for your car, a new celebrity gossip magazine, or arguably even a for-profit trade or technical school.

Whether your’e window-shopping around online for car insurance, a new laptop or hospice care you’re likely to start by plugging your needs into a search engine.

Below, I’ll go into the differences between concerns raised from both organic and paid search.

Organic Content, Friendly URLs and Search Engine Optimization

Search Engine Optimization (SEO) is the behind the scenes strategy that content marketers use to make their websites more appealing to search engines’ (like Google, Yahoo and Bing) ranking and relevance algorithms. With an effective SEO plan, a website with quality content will rise in the search rankings for specific terms.

The practice is massively popular and wildly complex – emerging as a new industry the collective infrastructures of e-commerce, the news media, and mesothelioma lawyers rely on for life support. Check out the sponsors list for your favorite “Search Marketing” exposition / tradeshow.

This is where I’d like to make a joke about the job title of ‘SEO Guru/Ninja/Jedi‘ but I think the corporate HR departments who had to agree to print that on a contract have been lost enough sleep.

One regularly favorable SEO practice is the use of ‘Friendly URLs’ – website addresses with natural language tucked in to appear more relevant to search engines.

For example, in a search for the term “Star Wars Funny T-shirts”, a website with the address www.Corey.com/shop/categories/shirts/men/star-wars.html will likely rank higher than www.FooBar.com/shirts/product12345 even if they’re selling the exact same product.

The flipped side of this is that marketers know which topics contained within URLs are more popular within their own websites. In the example above, the owner of FooBar.com may have less luck understanding which t-shirts are bringing traffic to their website.

Again, Tim Libert explains what he found in his research:

Investigation of [80,000 healthcare websites] revealed that 70% of [SEO friendly addresses] contained information exposing specific conditions, treatments, and diseases.

The same thing can be said for other common website building blocks that use natural language – titles, headers, section and category names, etc. To make matters worse, these elements can appear in your browsing history and favorites – which is especially troubling for users sharing computers.

When I asked Yevgeniy Levich, a well qualified SEO professional about this pickle, he raised a very valid point. To paraphrase:

Friendly-URLs are there to help users find the content they need, quickly. In the case of a user searching for ‘breast lumps’, it is really in their best interest if the URL contains the phrase “breast-lumps”. You can’t NOT call the page breast lumps if you want people to learn about the topic ‘breast lumps’ from it. The debate then becomes one of [conversion driven content marketing] versus patient education and if the same strategies should apply to both.

Looking at the competitive landscape is a bit depressing – we’ve locked ourselves into a situation that demands health care organizations use SEO strategies to stay relevant and appealing to search engine users.

In fact, the blog Search Engine Watch reported on a 2013 study showing, “the top listing in Google’s organic search results receives 33 percent of the traffic, compared to 18 percent for the second position, and the traffic only degrades from there…”

The graph below shows how traffic drops considerably when pages rank anywhere but the first page. This means content producers have to work extra hard to stay on top, often valuing SERP ranking over content quality and user privacy.

Sites listed on the first Google search results page generate 92% of all traffic from an average search. When moving from page one to two, the traffic dropped by 95%, and by 78% and 58% for the subsequent pages.

Cost Per Click and the Privacy Perils of Paid Search Tactics

Often, when you’re searching for content on Google, Facebook or Twitter, sprinkled in (or around) your results will be a few items that stand out as ‘promoted’ or ‘sponsored’.This means that the owners of that content have paid to have it appear among other natural, organic ranked results.

The main difference here is that organic content ranks higher because an algorithm determines it is relevant to a search term – while paid content sneaks in here and there thanks to an advertising budget (and is marked as such). Notice below that of the 5 main results, the first 3 are “paid”.


Search Results for “tshirts”


Facebook Ad Placement via Solo-e.com

So to quote my dear friend Sam Cooke, let’s Bring it On Home here:

  1. User performs a search for content on Google / Facebook / etc
  2. User clicks on a ‘clearly marked‘ paid search result
  3. User navigates to a paid search result page and collects a cookie/pixel/etc.
  4. User leaves the page, and continues to see advertisements for related products and services across the web

And this, friends, is where it gets really hairy.

Respecting the Privacy and Autonomy of the User (not a Tron joke)

I also suppose this is where I take it personally – both as an active user and content producer online.

As a user, I should assume that I’m not going to have ‘Hey do you want XYZ treatment at ABC medical center” following me around the internet.

As a content producer I should assume that my users (like myself, my family and friends) are real people and that online interactions have real-world ramifications.



Achieving a balance of user privacy and effective advertising (especially when concerning health care) doesn’t have to be difficult. Many health care systems have internal marketing departments (as opposed to outsourcing to a brand management company or an SEO content mill). As such they maintain control of most of the messaging they push out – both directly influencing the content that gets produced but also indirectly through the combination of marketing methods applied in specific circumstances:

If you’re an agency with bottom line as top priority maybe it’s alright to drop pixels and cookies on users when they’re browsing “about our product” pages but when you’re a nationally respected brand setting up shop in the digital marketing space you have to respect your current and prospective users’ right to privacy – and I’d argue especially in cases related to health care.

I’m not advocating for a marketer hippocratic oath / magna carta combo  – though that would be pretty nice.

MEDIA LITERACY SOAPBOX WARNING: As the role of user and content producer begin to merge, the veil starts to lift and tactics that blatantly take advantage of the public at large will hopefully have no choice but to adjust or perish (Yeah right – but a man can dream right?).

Its one thing if users click an ad that is legally, clearly marked as such and you drop a pixel on them. It’s another to prey on the vulnerable, looking for medical info at 3am when they’re out of other options.

I guess my point here is that just doing health care info seeking shouldn’t warrant the full arsenal of digital tracking methods. As brands, as trusted thought leaders, we can better than that. We can draw a line.

All it really takes is one infosec audit to make a major mockery out of your advertising practices. If a brand has quality, engaging, organic content that can lead to users making their own choice to convert instead of being led blindly down the funnel – you should be damn proud of that work.

If we’re really being honest, many older patients have zero idea what I just said over the last 3,537 words, let alone the average user. There’s an expectation of privacy, an expectation of respect of the users / patients’ choice to turn to a given brand. Don’t screw that up.

Just to be blunt, it’s not worth taking the cheap shortcut in search of potential dollars if it means risking the loss of your brand’s standing and reputation.

While the onus is on the user to use an ad blocker, to understand the media landscape, to read the privacy statements and TOS, not everyone has the opportunity to do so – or even knows that they should be (H/T to HBO’s Last Week Tonight w/ John Oliver). 

For some further reading on just how hard it is to keep medical conditions like pregnancy private from Big Data, I’ll leave you with a detailed account of Janet Vertesi’s experience as seen in Forbes. “The Princeton sociology professor treated her impending birth as many people might treat online criminal activity. She paid for maternity clothes in cash, insisted friends and families not discuss the bump on Facebook, surfed baby sites only with the Tor browser (which masks a user’s IP address), and used a code language to talk about the baby with her husband via text message…”

So dear friends – what do you think? Let me know in the comments or complain about me on Reddit. Either way – have this discussion.