SRCCON 2018 • June 28 & 29 in MPLS Support OpenNews!

Session Transcript:
Are you running an experiment, or are you just winging it?

Session facilitator(s): Sarah D Schmalbach, Sasha Koren

Day & Time: Friday, 4:15-5:30pm

Room: Ski-U-Mah

Sarah: We are going to get started in a couple minutes.

Hi.

How is everyone? How are you really? Thank you guys for that yesterday. That was a session that Jennifer and Andrew. When you ask someone how they are ask them how they really are. I am Sasha Koren and until April I was the co-lead at the guardian US. I am taking a break but will do something again soon enough.

Sarah: I am Sarah Cohen and I co-led the lab with sasha. I am working with a locLl spin up trying to solve problems.

Excited to do this retro and share with you some of the processes that Sasha and I learned through trial and error over two years of experimentation. We talk about this is just one way to go about it. Just a couple practical tips and things that worked for us but we are here to learn about the type of experiments you have run as well. Glad to have this venue to share it in.

Sasha: We will set this up so you know it is coming. We will do half an hour of introducing a methodology we came up with in the lab as we were figure out how to experiment with intention. We will then sort of pass out worksheets, it is very analog, but if it is easier to take notes on a computer or Post-its that is totally fine.

We will have you working in groups by table so at some point you might want to rearrange yourself but you are all good for now. Then we will leave 15 minutes at the end to discuss sort of the ends and outs and sort of have a general discussion of some of the concepts we have introduced and see what comes to mind.

So when we were at the mobile innovation lab part of this came about because we both been in more or less traditional organizations and experiment would amount to something like let’s do something for the world cup. How about we do X? Great.

It is experiment. We would go do it because it seemed like a great idea, according to us, or according to someone, and we would do it and then we would look at metrics and it seemed like a lot of people came to the page so that is good. Then we would move on and we would not necessarily have gotten any insight into what our users thought. Or whether or not the idea itself actually had value beyond our own sense of editorial or technology judgment. There are a lot of these missed opportunities.

When we are talking about trust and engaging with users in a more direct way and involving them in our processes, experimentation gives us a really big opportunity to actually change the way we do things to make them more intentional and user-centered.

So Sarah, do you want to speak to this slide?

Sarah: We just decided to kick it off with what is experiment because we had a pretty particular way of thinking about what was and was not experiment in our innovation lab. This is sometimes how I feel people think I look when I talk about experiment ideas.

We just wanted to pitch it back out TOU – to you. Have you run any experiments lately?

Anyone willing to share?

I think the experiment we run the most often are AP tests for modules or different components.

Tose are distinguished in that you have two versions and you compare across given amount of metrics.

Sarah: That is great.

Sasha: Anyone else? Recent experiments? Anything you have done in the past that might qualify as experiment. Tyler?

In our live coverage format we were experimenting with the thought of entering a newnew of content or card. We asked readers whether or not it was something that was useful and let them react and get feedback and they overwhelmingly said they didn’t want it and we were more than willing to kill the idea.

Sasha: Experimentation where it tells you something you don’t need to be doing anymore.

Any thoughts on a definition of experimentation?

What qualifies as experiment? We can start off with one of our own. The little boy is going to keep winding around this so, you know, if you want to space out on that, that is cool.

The one that – we had a very unusual setup in the mobile innovation lab where we basically were told to experiment with mobile storytelling formats and we had an open mandate to decide what that would be, how it would go, and what we would learn from it.

We had to learn something and then share what we learned.

We came up with some criteria. The idea didn’t need to be brand new in order to be experiment, but we needed to be able to measure success in some way according to a user’s – the user’s reaction and user’s perception rather than our own.

We wanted to understand to that regard about how the actual product feature piece was being used. The definition of success we came up with was simple. Was it useful? Was it interesting?

This isn’t what we thought but what we wanted to get from our users.

The first piece in this methodology we will walk through. It is a five-step method and we will walk through it. It is a bit dense so we will stop periodically and see if you have questions, but raise your hand if we are talking and you need clarification.

To draw a line between an idea and a hypothesis. An example might be something like, do you want to talk?

Sarah: I want to highlight this. The talk we are giving is a microcosm of the hypothesis driven design document that just came out of the NPR newsroom. I have a link to it in the Etherpad because it is impressive and walks through these steps in a more detailed way. I want to put in a reminder a lot of us haven’t encountered hypotheses since grade school but Wesley at NPR had a quick way of saying why do hypotheses matter and he said it might conjure up images of science fair projects but the purpose of the hypothesis is to offer an ex planation for the phenomenon and then you test it. It felt worth repeating that piece and explains why this is part of our methodology.

Sasha: Soit sort of goes like this. The idea in a nutshell is a product, feature, something you think will be interesting, of value to the audience, and you think they might like or get something out of it. The hypothesis from that, it is an additional step around that, the product will solve a problem for a particular audience doing certain kinds of things. We know that the product or feature, etc, solves that problem when we can measure it.

When we can measure these things. This is a little bit abstract. We will put all tees slides up on the Etherpad too.

They may already be up there.

One of the experiments we ran was notifications, live updating, metal alert notifications for the Olympics.

Some were alerts on specific countries the users opted in for, one was a leader board for the top medal countries. A daily leaderboard provides a summary of the competition and the ability to read and ingest in small format the result coverage. We will know it works if engagement is high and our users tell us it is useful. We will get to where they tell us in a little bit. Does that make sense? Any questions? Okay.

Sarah: Cool. After you have your hypothesis it gets done up front. A very simple second next step is to define success metrics based on all aspects of user experience. I will go straight into that because I think a lot of us are use to defining success on page views and time spent.

One of my favorite things about working in the lab was we really brought the user’s experience into our definition of success. So this is kind of just reiterating that, right?

What happens when higher engagement when something doesn’t equate to a better experience? How do we, you know, if you are doing kind of experiment, that is notification-base and someone doesn’t need to click through page view how do you start thinking about if it was a positive experience for people.

We built a new met into our system. For the record, we worked with an agencies in Philadelphia they are called Here Digital. If you are looking for people to walk you through success efforts for metrics they are wonderful. They pitched this net interaction rate which is interesting and it helped us define, think about experiment and all the ways people might engage, and we as a team would code the way that someone might touch it or play with it or engage with it, as positive or negative experiences. Then you kind of do this simple math, positive engagement minus negative engagements over total opportunities to engage and you get this advantage that tells you overall it was a positive or negative experience for people.

It scratches that quanatative need to have a number but takes into account how people feel when they are engaging with something. We just wanted to give you one example of a new way at a measure something. Then show you how it played out for this particular experiment which Sasha mentioned. This is a quick screen shot you would get every night at 11:30 eastern time for the games. It was a reminder.

People opted in if they came to the Guardian site. They are already a captive audience.

They could sign up for a couple things, one is an alert at the end of the day. People pay attention to which countries are on top. And a ping if something happened that was interesting maybe you want to read the coverage.

I want to show you the trackable interactions built into the rate. The first is tapping on the notification itself. The second thing someone could do is those are called active buttons. You could tap on the leaderboard button taking you to the full leaderboard on the Guardian site. You could tap on manage updates which is where we will allow people to turn these off. Or you could swipe the alert away. Those are the things we would be able to track.

And this is how it played out over the two weeks of the Olympics. Just to reiterate, again, what we coded as positive was if someone tapped on the alert on the leaderboard button or the manage updates and what we considered negative is if they dismissed the rate. We were in the old mindset of if someone swipes something away that is bad. We will get to maybe why that is not the best way to code the interaction.

As you can see over time the net interaction rate was going down because over the course of the two weeks people were dismissing the notification more than they were in the beginning and we saw this result.

We kind of started thinking about is dismissing an alert actually a negative interaction and this is kind of the place where I think a lot of people get stuck and think finding this out is impossible and we will never know what our readers think.

We deployed surveys at the end of the experiments, simple Google forms, and sit down and talk about what are the things the quantitative metrics are not going to tell us and put together a 10-12 question survey to chip away at what was happening. If you dismiss the alert what was the common reason? This wasn’t perfect. We saw 25-30 percent of participants getting back to us.

It was antidotal and helped give us context to the numbers.

We ended up recoding swiping to dismiss as neutral.

We tried hard to get analysts to report it as positive. She is not here. Poor Lynnette. It came out of the entire equation and based on this we saw an entirely different result. People were getting the information they needed and swiped it away because they didn’t need it anymore. That was a nice moment for us in terms of measuring success of experience.

Did the graph go up?

Sarah: It is interesting with experiments because we expected people to have bad experiences.

They are trying it for the first time and haven’t encountered this before. All we were looking for was anything over 0. We tried this new thing, applied new technology to news, and was it a positive experience? It was a 14 percent I think net interaction meaning we kept doing live notifications. That is how it went.

Any questions on that?

Yeah, go ahead.

If this is an optional survey how do you account the people that opt in are already more engaged and that might affect the curve you see?

Sarah: I think that gets back to the goal of our experiment which might be different than others which was to say does this have value to anyone? Does this way of displaying news more useful than another way they would get leaderboard results.

We never took one metric as the metric but we could talk hours about the framework we had. We looked at a lot of different signals to say.

Getting allies to track this can be difficult. We had the luxury of implementing precise elements in the lab. Our analytic agencies put together this and talktalked to us about high level goals and how we would quantitatively and qualitatively track this. They would hand us a beautiful set of documents that would explain what we would put in Google analytics. We could track everything people are doing in our notifications or experiments. Qualitatively we would do a brainstorm about the types of survey questions we want to do. All of this is done before launching. Oftentimes, these discussions sway the way we do something or label something based on the way we want to track.

Would you weight two the equally in terms of quantitative and qualitative or depends on the experiment?

Sarah: That is a tough question but I think in my experience, and this might be different from other folks, people tend to pick the one they like to tell the story they want to tell. Sounds like you guys have experienced that before.

That kind of is challenging and I think why we felt like it is not about less or more, it is just about taking both into account. I mean for us, I guess, qualitative mattered a bit more because it was about experience and engagement is something.

People coming to your page will let you know something but we want to know if these are useful paths for technology as it relates to news.

Sasha: I think, too, it was really important for us to have a holistic view rather than weighing one versus the other.

We wanted to get both perspectives and read them off each other. If qualitative is telling us everything is amazing and quantitative tells us no one did anything that didn’t necessarily happen but that would be a weird scenario.

Probably means a bug in your analytics which are very common.

I have trust issues.

Sasha: Maybe not a great example but just that we needed to reconcile the two together. I mean the neutral interaction rate was something that was a great example of how qualitative and quantitative worked hand in hand to tell us something about how a feature was being perceived and used that we had misinterpreted.

I probably should have said this earlier, but this is an overview of what we did and it is not meant to be prescriptive. If you were to try out elements of this, it might look very different for the scenario you are working in, for the organization you are working in, so this slide we didn’t actually use parsley but we includeded it to say here are some possibilities and ways you could get around, sort of reading metrics or a tool you might have in your newsroom or might not. Maybe you are loosing chart B or VA or something else.

Also we use Google forms. Maybe there is another way? Maybe there is like a business insights group where you are working and they have a survey tool they have already paid for and you can actually use. Maybe not.

Where were there any times when you were looking at quantitative metrics and had concerns about the privacy of your users or lawyers had concerns about the privacy of users and what you were collecting on the page and what they might be inputing on the page?

Sasha: We were careful to say we are not collecting data to share with anything. We didn’t ask for personally identifiable information except for the occasions when we asked for an e-mail optionally if they wanted or were willing for us to contact them for interviews or something like that.

Nothing custom on the GA side?

Sarah: No, it was mostly custom dimensions. We did complex thing in alerts where it would be an ex plainer of the jobs report and it would say here is the unemployment percentage and jobs added this month. Do you want the good news or bad news first and the two action buttons would be a thumbs up or down. We needed to see when people dropped off and that is the level of granularity you don’t have a notification and that is how we used more custom dimensions but I don’t think we every tracked location or picked up any personally identifiable information with the exception of lab reader which is a different experiment. We talked to the Guardian about privacy.

One thing I want to mention about Jennifer’s question was I forgot to mention the qualitative data oftentimes helped us do good product development. Things we didn’t bother people. Quantitative told us how this thing went.

Qualitative told us how we might improve in the future. When we did our experiment putting live data in notifications, we got people saying I think this drained my battery. You know, this is updating every 2-3 minutes. We were just kind of more sensitive to that and maybe changed our business rules next time and didn’t have the alert swap out so much. So the qualitative data surprisingly tells you things that help you make the product better over time which quantitative kind of can but is less of a pathway.

Great. Step four.

Sasha: Back to surveys. We will go through this quickly. We have a couple more minutes. We did use Google forms. The ingredients, some of the very brief ingredients that we used were, and there is an example of this linked in the Etherpad for the Olympics survey to ask basic demographic questions like age, what device you were using, I don’t think we asked location data much.

Sarah: Asked about loyalty like are you a regular Guardian reader or not.

Sasha: We asked a mix of broad and then granular questions. Did you like this experiment is an example of a very broad one. Granular; how did you feel about the number of links? Did you tap through? We can tell in aggregate but to tell why they didn’t tap through we could follow-up with. We used screen shots to clarify things we were talking about. We realized after putting in way too many questions that 15 seemed to be the sweet spot where it was just enough to keep people engaged and get a holistic view of things but not too many so they didn’t just drop off and don’t finish.

One of the most interesting things to me was that there were – we always left a box at the end that said anything else you would like to tell us about this experiment and people would write-in, not everyone, but I was very surprised that actually the act of asking – I should not have been surprised, but the act of asking and opening up that space for users to tell something about how they experienced what we were offering them gave them the opportunity to tell us surprising things like it drained my battery, to give us more insight and clarity.

Sometimes it was really positive like my favorite one is after the alerts on the Brexit vote, unfortunately, it was an 18 year old who said I am getting my credit card next week and I am going to pay for the Guardian.

That made people very happy inside the GuardianGuardian. I hope he is continuing to pay. It could also be not-so-positive. It didn’t work for me or you guys are so bias or whatever. There were things that were helpful for us to know.

How are you distributing the survey to users?

When we did alerts it was through a notification and that was the most common. Sometimes if it was something we were doing on a mobile web page to users we would put a link at the top of the page sometimes and follow-up with an alert. A variety of ways. We had to get creative.

Sarah: For instance, with we did our live video player for there inauguration, you could tap on the live video and it would scroll to the left-hand side so you could engage, we put a little link that said give feedback in the player itself and got people to click through there.

How good was your response rate?

Sarah: It was usually around 25-30 percent but the audiences were small. First experiment we had 14 people opt in. The largest experience, putting live data notifications around the presidential election, there were 230,000 people. It is interesting because the methods of analysis changed and we could not go through the free form feedback so our analyst put together a simple LP algorithm to do sentiment analysis and picked words and scanned the free form and said overall the feedback was positive. I get really into this but they could show us the most positive and most negative responses and called it a day.

Nice.

Sasha: Okay. Let’s move on to the final step here. This is a very important one. Just to put down retrospective, just to gather everyone who participated, really everyone, like if you can get anyone who worked on one specific aspect of it in the room to talk about how it went for them, what they thought, the questions that we used were very simple and revealing and meant to not place any blame and not make anyone feel overly sort of burdened if something didn’t go as well as we thought it might. We had the luxury of not having specific criteria for expectations for high traffic rates. We just wanted to know how things went and sometimes our experiments didn’t hit the mark we thought they would. This is designed to evaluate without having anyone sort of feel like it was – that anything we did was wrong.

Three questions. What went well?

What didn’t go well? And what could we do better next time?

Everyone gets the chance to talk. We tried to bring people out to sort of see if someone wasn’t actually contributing to the conversation to try to sort of elicit them. Sometimes people don’t want to. We tried to make it a good open space for people to say what they thought. That is about it.

Anything you want to add?

Sarah: No.

Sasha: Okay. I think what we have going to do now is try, sort of try as best as we can, it is a little bit of a conceptual exercise but we have worksheets. For the people who are not at tables, it might be helpful if you found a spot at a table, if you can. I don’t know if we will have enough seats at tables but we certainly have a couple here, couple on this sideside. Once you have introduced yourself to each other as you are doing and Sarah is done handing out the worksheets we will have about half an hour, sorry we have a question in the back. Oh, just a worksheet.

There is a little bit more room at this table if you want to squeeze up. Or you can gather in the back. That is great too. The first thing to do is to pick experiment topic. We have come up with examples here. You can also feel free to come up with your own if none of these appeal. These are just sort of hypotheticals but, of course, there is an infinite world of experiment topics. If you don’t like to think about it in terms of being experiment, think of it as a project. Any project you have done in the past or may do in the future. We will just work through it. We going to circulate and see if there are questions. Raise your hand if you are getting stuck.

Hopefully, we will be able to get to everyone. One, two, three –

Sarah: Just a real quick thing, too. We have about maybe 25-30 minutes.

You will see it is a five-page document and the five steps.

There is boxes and prompts to kind of have your team talk through each of those steps.

Feel free to write on whatever paper you want. If you want to use the Post-its go for it. We will give you a reminder you should be moving on to step two or three but move at your own pace. If you get deep in a hypothesis, wonderful. Hopefully you get further than that. Have a good time and thanks for participating.

Great!

[30 minute group exercise]

Sarah: You guys might start wanting to get to step two. It has been 5-6 minutes.

Sasha: If you haven’t moved on to step three, you should move on in the next minute.

Sarah: Or at your own pace.

Sasha: Does anyone need help?

Anyone stuck?

Okay.

We have gotten a little feedback that step three can be a bit confusing.

Just want to say it is more of a thought exercise into who would help you since we don’t have space in this particular setting to actually implement any analytics. Sort of think about who would help you and your high level goals.

Sarah: Step two is about your goals and what you want to know and how you would measure it and step three which we could skip if we do this again is just like a thought experiment of who would you bring together. I know in every organization it is unclear how to implement these things so just to get your brain going about who might help you run a survey and who might implement Google analytics. You can spend less time and get to step four where you can start sketching questions for the qualitative survey if that is what you want to do. Apologies.

This was experiment in and of itself. But yeah.

Sasha: And just to let you know we have about nine minutes before we start wrapping up and then discussing.

Sarah: Cool. Thanks.

We just have a few minutes left.

Wrap it up. We have 2-3 minutes and we will circle back and share what we learned.

Sasha: Okay.

[Whistles]

Sarah: I didn’t know you could do that.

Sasha: Me neither. One down.

So glad everyone wants to keep talking. Just want to make sure we have a little discussion before we go back to the room and wrap up.

Would anyone like to talk about how this went? Talk about the process? Any aspect? What it made them think about? How it made them think about thinks differently? – things. Let’s stick with step one. Let’s say I know that some people were having a little bit of problems settling on a topic. That signals a little bit that even figuring out what experiment is might be tricky. Anyone want to respond? Talk about that? Cho?

I think Evan had a hypothesis in mind for the question he was trying to tackleal and – tackle and we ended up digging deeper and asking him more questions and through asking him questions he was like that would be good to know too. We really had good, like, quality conversations at the very start that made us fill out the other pages a lot faster because we already touched on various things.

Do you want to share?

We are launching this next week. We sort of thought about this question of how do we, like, A, quantitatively measure, like, the value or qualitatively –

What are you launching?

It is like a visual design heavy story format as opposed to here is a news article. The big question I have is can that designed experience drive readers deeper in to the, you know, to the story and then like either, for brand purposes, or reader comprehension purposes, there is a thousand potential benefits.

The hypothesis being that it will.

Yes, it is good.

Sasha: Great. Thank you.

Anyone other teams want to share their project and then the hypothesis idea? Idea and hypothesis? Great.

Sarah: In the mean time that is amazing insight.

Experimentation is messy and that is why it took us two years to figure it out. Working through the hypothesis does help get through the process.

Couldn’t be more perfect. Thank you for sharing.

We always talked about something that is in the works.

It is mid-term election we are working on at NPR news. Lucked out and got a free workshop.

What we ran into it was I am excited about this project but I didn’t have a lot of defined what the goal is and how we are going to measure the metrics so we had the idea right away but then we had to take it slower on what is the hypothesis, what is this actually going to accomplish and what we landed on is it was going to sulfur, NPR’s distribution of election information for potential voters who also have smartphones because that is the audience who would be interacting with it.

Sarah: Cool. Nice. That kind of allows you to let go some some things when you are creating experiment. We didn’t disregarddisregard desktop but if we had time and were going to make it really good on mobile and just little better on desktop we could prioritize what we would work on so even having that specificity to the hypothesis can be helpful long-term.

Sasha: Time for one more?

Sarah: Sure. I think it is on.

We were talking about notifications but delivering them over Facebook messenger.

Most people don’t buy big expensive telephones and take photos and run out of face. So by using the existing Facebook APP to get notifications you don’t have to install our APP just to get push notifications from us.

Sarah: Did you consider what the cons of that might be?

We did talk briefly about strategies where you put everything on Facebook and then they decide to destroy your entire industry. That is a con.

Sarah: But it is fair when you think about how much you might invest in that strategy and test it first. Absolutely.

Cool.

Sasha: We have about six more minutes. Anyone want to talk about other – sorry, did you have?

It is funny we did the same idea. A visual first story template and I think one thing that came out was it also highly depends on the implementation of, like, the quality of the illustrations or photographs, the quality of your user experience if you have like a slide show or something like that, that could easily kind of derail or throw off your findings, so you have to figure out how to –

Sarah: Also people’s personal preference for visuals on phones. We think one template would be great for everybody but people have different preferences and how can you get the signals on how they lick to consume. Yeah, it always kind of – you start with a question and then you have 10 more. It helps you get through the rest a lot faster we found.

Sasha: Those kind of qualitative aspect of experiment tend to come out in the process when you are discussing went well or what didn’t like maybe we could have chosen different photos or written the captions differently. It is nice when the process it be iterative when you build on something once and try again even though in the real world that is not always possible.

Sarah: Did we want to just do – there is only two minutes left but burn down on the session? We have going to do this again at ONA so feel free to skip it or come back and see if we improved. We know it isn’t perfect so if you have any feedback about what went well, what didn’t –

This is quite intimidating.

We kept turning the page and saying there is more.

Sarah: I know. My mom is a fifth grade school teacher and I think it was all coming out in that documents.

More worksheets.

More worksheets?

No, kidding.

Sarah: You are absolutely right. We will condense next time, I think.

I think we had a ton of surVia – survey questions and I had to write in the margins.

Have more space for that.

Sarah: Yeah, it was clear some things were overexplained.

It was very helpful to have the template.

You guys did a good job.

Thank you so much.

I wonder if it would be worth giving people specific prompts or have a deck saying here is your thing because we spent a lot of time getting started.

Sarah: Absolutely. That makes sense. I think it is always the balance of letting people be creative so we were trying to say here is a couple things but they were not granular enough to get started on. That is really helpful. We will do that for sure. Yeah?

Maybe a reconfiguration of the document is like, you know, you want to know what do we want to know and we can figure out later will we get that from metrics or will we get it from the qualitative survey.

Sarah: Sure.

Maybe a list of what do we want to know after we have run our experiment and then the second step is now label those which ones can you count and which ones do you have to have as a survey question.

Sarah: I think the reason they are all on that page is we think about what we already know about things and the qualitative stuff doesn’t come naturally to us so I think it was meant to bring that out but maybe we can emphasize that and say everything you want to know. Off the wall. Were people awake when they were experiencing your product? So, yeah, that is good feedback as well.

I think instead of having presentation and workshop time you do presentation on page one, do page one.

Sarah: We were chatting with folks at SRCCON about what might be better. We were on the fence about that as well. Absolutely and that might help the descriptions be short or long so that probably makes a lot of sense, too. Thank you. I love this is the best participatory part.

Just on a really basic level we didn’t even bother to check in with each other. We just dove in and about midway through we introduced ourselves. We just forgot we were so focused and I think that opened up our dialogue and it is just something we forgot to do.

Sasha: Really good point.

Thank you. In general, though, I think what we wanted to do with this process is just introduce a possibility of bringing and how you might be able to go about thinking the projects in a way that can open up other information. Also check your assumptions. We all go into the project development with assumptions and don’t thing –

think we do. It is one way to start that inquiry. You would like to use notes on the Etherpad about something coming up in a couple weeks and you want to tell us about it that would be great too. Anything else?

Anyone want to respond? Comment?

Go get ice cream? A drink? Okay.

Thank you, all.

Sarah: Thank you so much.

[APPLAUSE]