6 days ago

Data Engineers Will Be More Important Than Data Scientists

Does it seem like the ability to find, hire and retain data scientists is a losing battle? Is spending $500K+ per year for a Data Scientist worth it? What is a data scientist anyway? Those a real questions and are the markers that how you are supportin…

1 month, 6 days ago

Help Wanted: Data Innovation For The Data Economy

Thomas Edison once said, “The value of an idea is in the using of it.”  Today (many of) those “ideas” are data and the insights derived from them, and it remains true that their value is in how they are used. Simply put, data + use = value. Insights-driven companies use these data-derived insights in […]

4 months, 20 days ago

Rhyme or Reason – The Logic of Netflix

@GuyLongworth, who teaches philosophy at Warwick, is puzzled by the Netflix recommendation algorithm.

Having seen both, I can only think that this must have to do with rhyme.

— Guy Longworth (@GuyLongworth) June 29, 2017

Philosopher Guy’s appeal to rhyme rather than reason seems to be based on the view that the two films have nothing else in common. But this is rather contradicted by the fact that he has actually seen both. Netflix has correctly surmised that people like Guy might possibly be interested in both films.

The first thing to understand about recommendation algorithms is that they are not solely (if at all) based on the intrinsic similarity of two products, but on what we might call relational similarity. If I tell you that people who like pizza also like ice-cream, that is primarily a statement about the “people who like”. You might try to explain this statement by observing that pizza and ice-cream both have a high fat content, but then so do lots of other foods.

And when someone has just eaten a pizza, it is perhaps more likely that they will go on to eat ice-cream next, rather than eating another pizza straightaway.

Would it be virtue signalling of me to reveal that I resisted the lure of the second pizza?

— Guy Longworth (@GuyLongworth) June 22, 2017

The second thing to understand is that recommendation algorithms work by trial and error. Netflix wants to know if Guy will accept its suggestion to re-watch Annie Hall, and this feedback will add to its knowledge of Guy as well as its knowledge of relational similarity between films.

Trial and error works better if you have a diverse range of trials. If you watch a couple of films in a particular genre, and then Netflix only ever shows you suggestions within that genre, it will never discover that you might be interested in a completely different genre as well. And you will never discover the full range of Netflix offerings, which could result in your abandoning Netflix altogether.

Diversity of suggestion adds to the richness of the experimental data that are generated. How many members of the “people like Guy” category respond positively to suggestion A, and how many to suggestion B? Todd Yellin, Netflix VP of Product, told journalists in March that “we are addicted to the methodology of A/B testing”.

What is genre anyway? In the past, genres (in book publishing, music, film, video games) were defined by the industry or by experts. In 2013, Netflix employed over 40 people hand-tagging TV shows and movies. But a data-driven approach allows genres to emerge organically from the patterns of consumption. Netflix (and Amazon and the rest) will be much more interested in data-defined genres than in industry-defined genres.

In her rant against the Netflix algorithm, @mehreenkasana makes two apparently contrary complaints. On the one hand, Netflix offers her content that is nothing like anything she has ever watched. She dismisses one suggestion with the words “I’ve never watched a show in a remotely similar vein.” On the other hand, she doesn’t see how Netflix can offer her challenging experiences. “Intensely curated experiences, whether you’re looking to explore movies or to meet people to date, remove one of the most critical aspects of a rich experience: risk, as in going out of your comfort zone.”

But as @larakiara explains, “personalization is key to ensuring users keep coming back. But there’s also the problem of over-personalization, so Netflix has to introduce variants.”

Thus we can see Netflix as an embodiment of at least three of @kevin2kelly’s Nine Laws of God.

  • Control from the bottom up
  • Maximize the fringes
  • Honor your errors

“A trick will only work for a while, until everyone else is doing it.” (Remember Blockbuster.)


Mehreen Kasana, Netflix’s recommendation algorithm sucks (The Outline, 24 March 2017)

Kevin Kelly, Nine Laws of God. Chapter 24 of Out of Control (1994)

Lara O’Reilly, Netflix lifted the lid on how the algorithm that recommends you titles to watch actually works (Business Insider, 26 February 2016)

Janko Roettgers, Netflix Replacing Star Ratings With Thumbs Ups and Thumbs Downs (Variety, 16 March 2017)

Tom Vanderbilt, The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next (Wired, 7 August 2013)

1 year, 3 months ago

Insights-Driven Business Are Stealing Your Customers

Is your business digital? Like Domino’s Pizza, do you realize that you are not a product or service business, but that you are a software and data business that provides products or services? Do you exploit all of your customer’s data to know them insi…

1 year, 4 months ago

Digital Health, Marketing & Analytics in Northern Virginia, DC

The recent Northern Virginia Technology Council (NVTC) Healthcare
Informatics & Analytics Conference, at the Inova Center for
Personalized Health, was a huge success. Oracle’s own David Dworzczyk
(Ph.D.) – representing the Oracle Health Sciences
business, generated enthusiasm and interest in how big data and
analytics focus is becoming embedded as part of the entire health
sciences ecosystem, from pharma trials and policy, to healthcare
delivery and genomics research.

1 year, 8 months ago

kCura Puts the CAAT Into The Bag . . . Acquires Long-time Partner Content Analyst Company

We’ve seen another acquisition in the shifting eDiscovery market this week as kCura, the developer of Relativity, announced its acquisition of Content Analyst Company, the brains behind the CAAT analytics engine (kCura’s press release is here). The acquisition is not entirely surprising. kCura has been relying on the CAAT engine to power its analytics offering for eight years. According to kCura, use of its Relativity Analytics offering “has grown by nearly 1,500 percent” since 2011, with more than 70% of current kCura’s customers with licenses.

What does this acquisition mean for kCura, its customers, and Content Analyst Company customers?

Read more

1 year, 8 months ago

kCura Puts the CAAT Into The Bag . . . Acquires Long-time Partner Content Analyst Company

We’ve seen another acquisition in the shifting eDiscovery market this week as kCura, the developer of Relativity, announced its acquisition of Content Analyst Company, the brains behind the CAAT analytics engine (kCura’s press release is here). The acquisition is not entirely surprising. kCura has been relying on the CAAT engine to power its analytics offering for eight years. According to kCura, use of its Relativity Analytics offering “has grown by nearly 1,500 percent” since 2011, with more than 70% of current kCura’s customers with licenses.

What does this acquisition mean for kCura, its customers, and Content Analyst Company customers?

This is more than just one vendor acquiring a partner to bring its tech in-house. The markets kCura competes in are changing. Customers want better predictive coding workflows, reporting, and visualization capabilities. The momentum around technology-assisted review (TAR) in eDiscovery is growing globally. In February 2016, the Pyrrho Investments Limited v. MWB Property Limited case gave the green light to predictive coding software in the UK, with the decision (PDF) citing acceptance in US and other jurisdictions. Interest and adoption of analytics for eDiscovery and other investigative use cases will only grow. Now that machine learning and technology-assisted review processes have been OK’d by the courts, many of the objections to using software for automated categorization, security classifications, and other analysis of textual data will dissipate.

Read more

1 year, 9 months ago

Big Data & Analytics in Northern Virginia, DC Area

Big Data, Analytics & Data Science are taking off as regional economic development catalysts – and outcomes – around the world, and particularly so here (DC/MD/Northern Virginia) in what some call the “Big Data Capital” of the US (given the proximity and engagement of so many commercial, federal/state government, nonprofit and startup organizations in this field). Here are a couple of examples, of activities going on in the area.

1 year, 10 months ago

Big Data Analytics – Unlock Breakthrough Results: (Step 5)

In this step we develop one of the most important interim products for our decision model; the analytic user profile. A profile is a way of classifying and grouping what the user community is actually doing with the analytic information and services produced. We develop a quantified view of our user community so we can evaluate each platform or tool for optimization quickly and produce meaningful results aligned with usage patterns.

1 year, 10 months ago

Big Data Analytics – Unlock Breakthrough Results: (Step 4)

Continuing the series on Nine Easy Steps to Unlock Breakthrough Results, we now assign relative weights to each of the critical capabilities groups for each operating model uncovered earlier. This is done to assign the higher weightings to capability groupings most important to the success of each model. Having the quantified index means we can evaluate each platform or tool for optimization within quickly and produce meaningful results.