ERIN: I’m Erin.
STEVE: And I’m Steve Stirling.
CARLA: It’s why local journalism still matters. So why are we having this session and why the ’90s nostalgia as you can see and I’m all dressed up in ’90s gear. These guys didn’t go with the theme… but oh, well because we’re bringing back the ’90s! Yay! Because it was the last heyday of local journalism. You know, when we used to have a lot more because, local newsrooms are pretty much shrinking. So in 1990, there were 56,900 full-time journalists, and 2015, it was half that, and according to a survey of the American Society of News Editors. And an interesting tidbit in that, of the ten or so dailies still selling more than 250,000 editions a day meaning that any increase, you know, in journalists in any venue being felt on a national level and not local daily. So those of us working on a local level are just feeling the depletion way more than national.
STEVE: And for some local perspective for us, at the Star Ledger. We probably had ten counties that we covered more than most, and each of those counties had boroughs of ten to 12 people in them. We probably have ten to 12 local reporters today. So it’s been a rough couple of years.
CARLA: And also, let’s be honest, technology-wise, local governments are also stuck in the ’90s. That is the website of the Borough of Lakehurst where the Hamburg disaster happened, and they decided to have that as the worst disaster to happen in the city ever. But they’re still funny little images. Yay. So the thesis, the whole point of this is how can we, local data reporters, in 2017 bring back ’90s-era news coverage? But with 2017’s limited newsroom resources. And also use 2017 technology with local governments still stuck in the ’90s so we’re going to solve a lot. So it’s time to stop, collaborate, and listen! All right. So this is the interactive part. We’ll be putting up a bunch of questions on the screen and for each of us, we’ll have, like, two microphones going around, and, well, you could tell us your thoughts, the questions are just prompts. You can tell us your thoughts, the problems that you have with this issue, your solutions, and your intended solutions, what failed, what worked, what didn’t. Let’s just have a rap session, y’all. And feel free to vent and if you feel the need to talk off the record, remember, this is being transcribed so make sure to say so and it will not be transcribed. I’m also going to be writing an article for SRCCON after it for the Source Blog, so make sure to, if you need to be off the record, I’ll be off the record. All right. So let’s get started with the first question. You guys can hear me okay, right? All right.
STEVE: All right. So one of the things that I was thinking about when they were planning this session is, you know, I want to stop saying that news coverage was better in the ’90s of better when we had more resources but we certainly were capable or doing more with those resources. So I’ve been trying to think about what can we replicate, perhaps through automation, through scripting, what have you, that we had back when the newspapers were basically printing their own money and had all these resources at their disposal. You know, the thing that I think of when I think of this is when I started my career at the Star Ledger, we still had news clerks which we don’t have anymore. And those news clerks served a function in that they funneled all the information that came into the newsroom, or agendas, town halls, and school boards, they made sure that things got sent to the reporters that needed to when they were not being sent to the reporters themselves. Are there ways to automate those today? I suspect there are in certain cases, but that’s the first question, what are some of the things that we can do, and what have you guys found success, in sort of the cliché of doing more with less work that we’ve been able to leverage technology to sort of replicate some of the resources that we used to have and no longer do in people form.
AUDIENCE: Should I just talk? Hi, I’m Rachael. I work with The Daily in Spokane, Washington. I have an idea in one of our developers guys is here actually did the work. But we built a scraper that scrapes the jail roster every day and we made a spreadsheet of, like, school-board people for the region, and, like, politicians and high-ranking police officials and stuff. And so we run the roster against that, and if there’s a match, and we used pretty loose matching because we were going to check it out but then we get an email automatically.
True story, we were scraping a mobile county jail and they decided to take that entire roster off of the scrapable website and put it in the app, just a couple of weeks ago.
CARLA: Yeah, I worked really hard on that and they completely screwed up my work. So I’m really happy now, guys.
STEVE: It’s a a especially great place to look for, and we don’t think they meant to do this but we were publishing all their ICE detainees on the county.
CARLA: And now you can’t tell which ones are which. It’s fun.
STEVE: Anyone else?
AUDIENCE: So I haven’t used these a bunch, but I interned at the Marshall Project last summer, and my colleague, Tom Mayer developed a tool called Classa, which basically launches the divs or a section of a website, particularly websites of other websites that are structurally pretty simple and tells you — in a sense, an email assigned to different changes. So rather than having to check out all these sites regularly and seeing the change, you can — that is one way to automate that particular thing.
STEVE: Just another thing, I have an example. Claxon is great and it’s free and it’s open source. So you can install it for your newsroom but on a lesser level, changedetection.org works the same way. I totally forgot that I set up a change detection for the New Jersey Department of Mental Health and Addiction Services. And at 11:30 at night, they published their substance-abuse annual report which is a good sign in the first place that that’s when they’re publishing it. But they had published this report and shared how much of substance abuse treatment demand they were meeting, and it turns out by their math, they were only reaching about 50% of New Jersey which is important because Chris Christy crusader for treatment on the OD crisis and he hasn’t been doing a particularly good job. And we also found through that, on the last page of that, that they also were manipulating the math in order to make their results look better so that it ended up being a good story.
AUDIENCE: David for Politics and pioneering, in St. Paul. It’s a lot easier with campaign finance, both local and national races, spreadsheets and other data made it a lot easier to do the same work in less time, or more work in the same time in terms of analyzing, and also, increasingly in the last few years, the rise of APIs has made it possible to write scripts in advance that could handle both processing. So if you wrote the same basic type of story on fundraising you can now — and I’ve written a script to download the candidates’ financial reports, and tell me who their ten biggest donors were, and what they spent — what the three biggest areas they spent their money on, and all sorts of various questions that I would ordinarily have to pour through a spreadsheet that I can replicate.
CARLA: Is this something that you also — the whole newsroom can use, or is this just for yourself?
AUDIENCE: I put it up on GitHub, so anyone could use it, but I’m the only one in my newsroom who does any coding. So it’s a little bit. I’ve done some other projects where I’ve created web apps for things but they don’t really fit this theme of doing — trying to do things we used to do in the ’90s that we can’t do now.
AUDIENCE: One thing I remember back when I worked in a newspaper in the ’90s was that we would create photo — the tricycle herald, they won state and local awards. One thing that they can do is harness the power of images in favor photographers and newspapers that are energy efficient.
AUDIENCE: I just — I just left the NPR Information System. I was on the public radio and I think one thing that we can do, especially given the limited resource now is make sure that our technical documentation is solid so that other newsrooms and other organizations can find these projects that sound so exciting, that we are able to replicate it like what David was just saying, it was a team of one, there was nobody else in the newsroom who could do it. And so, I — while I was at VPR, I really interested make sure that everything that I was doing was well documented and could be picked up by another station and I think there’s a lot more work don’t for all of us.
STEVE: We’ve made a greater attempt in that regard to not only make sure that we’re publishing all of our methodologies, but we also started using data.world which is the social — builds itself as the social network for data but actually we were watching it for a while from afar but it actually seems pretty robust in terms of a way to share data easily and quickly and we’ll talk a little bit about how we’ve collaborated with the Associated Press on that.
AUDIENCE: I’ve found what works sort of doing data and news apps around local topics is writing very small libraries that do cross walking between different kinds of school IDs. That may be the state and city uses or a library that just lets you iterate through school districts in a county, or townships in a county, or crosswalks between those. Those are sort of, like, every other piece of code I would write would need to do those things. And sort of laying that foundation saved me a lot of time and drama.
STEVE: That’s a tremendous point because that is something that we deal with all the time. We are a small state. We have 21 counties, but yet, we have 568 towns that all have their local government. We also have school districts that are self-governing and so we run into the need for cross-walks on a super regular basis and it’s tremendously helpful to have that sort of thing — to have the ability to sort of connect your data in that way, especially when there’s often multiple government codes for the same entity and stuff like that. So that’s great.
AUDIENCE: And I think it’s good to try to be intentional about how you design that. So maybe you have a programmatic interface but it just sort of wraps a CSV file, so you can use that in your code and you can also just send that CSV to another reporter who might be working in Excel and that’s something like that bound to be just a design decision that worked out for me and for others when I was on the research team.
ERIN: Yeah, we do share some data with the reporters, especially the ones that understand data to a certain extent by packaging census data into individual counties so they can see quickly, you know, for Atlanta County in all the towns, or, like, their demographics, or poverty issues, income, and things like that. So that’s kind of a way that we’ve disseminated — like, dealt with — had all these tasks to deal with is by sending it out to a newsroom so that individual porters who knows more about it can deal with it.
CARLA: So we’re going to move on to the next question. Yeah, okay. All right. So how do we deal with local governments with outdated technology. And extra credit, how do we deal with our own newsrooms’ similarly outdated technology? So as we showed you in one of our slides, we have a lot of municipalities with a lot of, like, weird websites, and weird data formats where sometimes if you have, like, data and Steve can show you one instance, interesting instance.
STEVE: So this is a dot-matrix printout. And if you’re too young to know what dot matrix is, it was a type of printer that was popularized in 1980s, and ’80s, and they were largely replaced by laser and ink-jet printing systems. So I received this as a records request for an electronic copy in the second term of Barack Obama from Mersha County, New Jersey. So that’s just an example of what we were dealing with at the local level. And they thought that this was totally fine to provide this to me, a local copy of Homeland Security documents that I was requesting from them. So yeah, we’ve got challenges.
CARLA: Yeah, we have a lot more war stories about how we’ve received data in weirdly spaced out like lots of — we have lots of tales but, like, one of the ways that I’ve try to solve it is I’ve try to make friends with a lot of people who are trying to change that, especially in the bigger cities like New York and Jersey City. They’re trying to bring some change into the local governments. So it’s been very hard. But I’ve tried to make friends with them and said, kind of like, wouldn’t it be great if you put, like, this up. You know, as a G-J song, you know, so we can download it, CSV, spreadsheets, you know? So it’s kind of like this weird collaboration where we’re doing consulting. And they’re, like, freeing the data for what we really want for, like, our journalism purposes. So yeah…
AUDIENCE: I’m from the Dallas Morning News and I work in the National Borough of Announcement. And one of the problems that we had was when you asked for records, there was one agency that would give us a bunch of records but didn’t give us the record layout to go with it and so we had no idea basically what it was we were looking at? At least not enough to report accurately. And they were claiming that for security reasons, they couldn’t give us that information.
AUDIENCE: Hire a secretary of information.
AUDIENCE: And so what we ended up doing, and ended up working out somewhat in our favor is working with legislators who, we found one legislator in huge data Texas that sort of understood technology because he had been a programmer. You know, he helped us explain to the agency what was going on, but then later on, we mentioned about getting it — but then later on, he helped create — he worked on a law to let other agencies know that these layouts to open records are also records. So by working with a legislator you were actually able to get access the one which is what we want.
STEVE: That’s great. I mean, finding allies for me is, wherever you can find them is great. I mean, if you guys have local press associations and stuff like that, and maybe a news organization doesn’t have access to lawyers and they’re generally, in most states, somebody that you can connect with, somebody that you can at least provide you with some advice on how to proceed. I remember some time, you know, your example of being denied for security reasons for a record layout, I wanted, for some reason to track the movement of all of the bears in New Jersey — had tagged — the black bears that New Jersey had tagged over a number of days and I was denied that information for Homeland Security reasons because they told me that —
AUDIENCE: Black bears?
STEVE: Yeah. They told me that if we published that data, which would have been dated anyway, then people would try to back down these bears and then that would be a threat to them which is a threat to our Homeland Security, which is a stretch… but this same agency claimed that medical privacy rights for an autopsy rights in adulthood so…
AUDIENCE: Good grief.
AUDIENCE: New Jersey just passed an open data act which was pretty awesome saying that if you have this data in an available format, you need to put it on that and get people to view it. Enforcement — we’ll wait and see. So… yeah. (Erin)
AUDIENCE: Hi, I’m Chris. I work with Cors, and I also work with a site called Talu. So one of the things that I’ve in working with local court officers and local government officials that’s very different from my experience in a national level is that they’re naïve and it can cut both ways. They can cut in the way that they’re proprietary and they’re useless. But it can also be quite productive to use it the other way. I’ve gotten records that I shouldn’t have gotten because I asked for them. I asked for more than I needed. I asked for too much. I try not to use that because then I’m burning the source but a lot of times, these people are genuinely trying to do their jobs, they’re trying to do well and sometimes if you ask for something as straightforwardly as possible, then you don’t assume that your department’s gonna do. And then the other thing that I would say about that is that we’ve gotten major out of requesting these big data dumps and even if they’re not in usable form, we get value and the fact that we say, we just cleaned up this thing that you sent you, in a format that was extremely cumbersome to work with, and now we’ve got some service from it. And ultimately that feeds back into their process and you can do more with it the next time around.
AUDIENCE: Hi, my name is Ely and I’m currently working for distribution for a while. I started working on this project with a reporter on railroad commission data, and when we contacted Texas Railroad Commission, the data they gave us, and the guide they had written with was written in a Cobol mainframe which is older than I was by several years, and they said that it constituted extra work that they would have to put in and so they had to charge us for it. And so I took the rest of the data, and figure out if anyone figured out a solution to work with it. And I managed to find an agency that did work with their data. Their fee was pretty exorbitant, but after we saw a screenshot of what they had done, it was like, oh, this is how you used it. So now it’s one option.
AUDIENCE: To the second question, first, how can newsrooms deal with similarly outdated technology, I would say the single most important thing I’ve done to do that is when I sign on a current job, I negotiated for a computer with administrative access so that I could install apps without having to go through a bunch of hoops, which means that I wouldn’t have to worry about what roadblocks are in place, and could experiment with the wide range of open source, free, shareware, whatever software out there and start doing stuff without having to ask questions, or get a budget or anything like that. And the other part I found that a lot of local times they can’t provide accessible data is that they don’t want to because in part, they haven’t thought. They don’t understand why you would want it. Like, the ubiquitous, we have a spreadsheet, we’re going to send you a PDF of the spreadsheet because that’s how they share the around the office. They print it out. They print out a PDF and they can’t understand why anyone would want to get the original Excel spreadsheet. And one thing that I’ve found that’s useful, is that every time that I ask for a data, I ask for a spreadsheet, and even if it’s on a deadline, and they say, it’s going to take us a couple of days to get it, I say, yes, I want it anyway. Even if it’s going to be useless for the story I’m working on, point is, they should provide it, and hopefully with these requests over and over again, hopefully they’ll be ready by making these accessible formats because it will save time in the long run.
ERIN: I have a quick show of hands, how many of you have administrative access on your own computers? How many of you had to battle for administrative access on your computers. I can relate to that, I had to have a long conversation with IT before we could convince them to give us access. Anyway. End of question.
AUDIENCE: I always had pretty good success in — especially when dealing with someone within communications office, or a publication officer to say, oh, — they’re saying, it’s going to be this long to get the data. It’s going to be so hard to get it. And well, and then they say, well, not only can I register it, but how do you guys — you guys are inept with dealing with your data. Your job is to deal with your data. And out of all the times that I’ve done that, I think I’ve only written the second story once because they’ve given me the data.
CARLA: All right. So the next question: What are the challenges in recruiting for local newsrooms, especially for data journalists and coders. And yeah, we’ve had a lot of issues dealing with them. We were just, you know, talking about how we don’t have our own data page, or our own nj.com data page where someone asks, then we have our own page which has our own projects, and maybe it’s for reasons for the original data company that we would have to work with, but, yeah, it’s just like, it’s sometimes hard to sell, especially some of the data journalists that are coming out of the universities that are looking for national and trying to get into those great newsrooms which are great teams and, like, great places to start. But it’s kind of like — we’ve found that, like, it can be difficult sometimes to be like, hey, local data news, and state news, like, you could do a lot of stuff here, and, like, you might not be living in New York, or DC, or, you know, but, like, you would probably going to be like, one or two or three of one. You’re going to have, like, so many opportunities to, you know, do media with a lot more freedom to do stories you want, and you won’t be side-loaded to one thing. So that’s how we’d like to sell it.
STEVE: I will put out saying that we’ve had good success in convincing our leadership that what we do is important and vital and we’ve hired our fourth data reporter and she’s starting right over there, starting on Monday, which is awesome. But yeah, have you guys had challenges in selling, you know, data to local newsrooms and recruiting people or anything like that?
AUDIENCE: So we’re here and recruiting is tough and it’s clear that we don’t have enough programs that are training data journalists to be ready for a job immediately. And most of those are going to the coasts and to the bigger newsrooms as you have pointed out and I think we need to move toward building internship or apprenticeship type things in our newsrooms and basically home-grow our data people, right there in the newsrooms and I think that’s what we have to push. I think that’s the only solution that we have right now. The data programs are not growing. Although I just read that Columbia University is creating a masters’ degree in data journalism. So maybe we’re moving there. But we’re competing on money, we’re competing on location, as you said. You know, that I like to pitch the idea of you could be the big fish in the little sea, which I’ve always found to be a good thing, being a little fish in a big sea, not so much. So I don’t know what else we can do other than that.
CARLA: Yeah, the internships, that’s how we got Erin. So we’ve had success with that, as well. So thank you so much.
STEVE: Yeah, that’s the basic strategy that I’ve been selling. You know, we need to great a good pharmacist and we need to bring really talented people up. Like I’m sure many of you in the room oh and we might still be hiring so if you’re looking to work, let me know.
AUDIENCE: Speaking of farming, I had a comment about local journalism schools. So I’m from Tucson, Arizona, and I work at the Arizona Daily Star as a reporter but everything that I know comes from IRE, or SRCCON, it doesn’t come from J-school. So one of the problems that we have when we’re recruiting data reporters in our local newsrooms is that our local journalism schools don’t have programs that teach computer-assisted reporting. And so what we’ve done is kind of take our own initiative to teach incoming interns how to do some basic things, so they could take an interest and pursue further education. But I think it might be a good idea for newsrooms to partner with journalism schools in their area to emphasize why this is needed now.
CARLA: And I might also add that this is doubly true for journalists of color, as well, and we are, we are proud to be a 50% person-of-color data team. So, yay.
AUDIENCE: This might have been implied twice now, and not being a local newsroom. I don’t know, just looking for some clarity but do people — are local newsrooms wanting people that only have degrees in journalism? I know, you know, I think a minority of our staff has degrees in journalism. So if that’s the criteria, maybe that’s, you know, something that you should no longer require. And no longer be a challenge.
AUDIENCE: I’m a mostly self-taught data journalists and I want to just say that managers are hiring out there. I’m a lifelong resident of Washington. I don’t have an interest of leaving the Pacific Northwest. I don’t want to move to New York, or DC, or San Francisco, and be in a big pool with people that know what they’re doing. And that’s something that you can leverage. We have lives outside of work. Like, some of us like to be able to afford our rent. That’s a cool thing in some places. You know, we want to not have rush-hour traffic. There’s a lot of stuff that you can sell there, and, I mean, a lot a lot of our local papers especially if you’re not in a big city, you can pay a lot less and they would be still making more than they would be in a huge city.
AUDIENCE: I just want to second what was said about flipping the perspective and instead of just thinking as a question about purely how do we train journalists in data and code, I’m in the tech industry and I’ve learned from news that I think that’s a perspective that I think will help to propel the industry forward. That there are a lot of people out there that are learning to code right now, and we can take advantage that have, and teach them the skills of, you know, what makes a good journalist and what makes for somebody who has skills to contribute to the newsrooms. And I think that overlaps so I think that flip of perspective I think is really crucial.
AUDIENCE: I work for an ownership company. We have 122 papers that are in really sexy places like Bloomfield, West Virginia. We’ve had a lot of success selling The Beat, versus, you know, we only had one journalist moving to the middle of Iowa. But if you’re that you’re looking for someone to work on fracking, or to look at the coal industry, that’s more enticing to a lot of people than just advertising a certain position.
AUDIENCE: I also think part of it is calling people out when they say, oh, but I’m bad at math. I was an English major and ended up doing data stuff because I decided that — I took a stats class way after college and I said, like, this isn’t as scary as I thought it was. And I think that there is that roadblock for a lot of people who could, like — it doesn’t have to be just like me doing data in our newsroom. It can be, like, other people can be seeking out datasets and playing around with them in Excel. And some of our reporters do. But just calling people out when they say, oh, percent change, I don’t know how to do that. I’m bad at math is important to kind of break down the, like, scary exterior.
STEVE: I would totally second that because I work a lot with my alma mater, which is in New Jersey, and the biggest struggle is selling the kids on why they need data journalism not being scared of it because they’re all numbers. You know, I’d love to hear examples of how people have been successful in doing that because I think that’s still a challenge for me is making data sexy to people that makes it work.
CARLA: We have time for, like, one more question. Not question — ask? Prompt.
AUDIENCE: I just wanted to build off that idea because I’ve kind of had this introductory of being an English major, and being bad at math. And take they teach math in American schools is something that we can have a whole panel on, but I’m convinced that math is actually not just abstract math, but that it actually is, it’s sort of a language of understanding the world that you get a little bit further with more literary, wordy people like me. So that was helpful.
CARLA: Okay. So next question. Erin, do you want to take this one?
ERIN: So I think in New Jersey, we have a relatively suburban and urban population but in the south of the state, there’s still a large number of people who don’t really get quick Internet access at all. And I think we also see a difference in how people read and consume our journalism. Some of them love — you know, think only look at our site when they are at work on a desktop. Some of them are commuting to New York. And so they’re doing it on their phone, on the train. And each of those kind of get their own unique challenges and I think there’s also a question of data literacy in rural areas. People that don’t necessarily idea what a histogram is or what the median of a dataset is versus the average or mean. So I was wondering — well, it’s kind of a multipart thing but how to present your projects so that people can easily consume them no matter where they are. And that they can easily understand them no matter where they’re coming from.
AUDIENCE: I think one thing that’s important is that on every team, on every data project, someone who doesn’t understand data should be on it, and you should listen to them but not too much. You need to get perspective of someone who doesn’t have the deep understanding of the data, math, and the statistics, and code, that a lot of data reporters have, so you can understand if what you’re saying, if this beautiful chart you made about this dataset that you’ve worked so hard to get maybe just like is totally incomprehensible to the average person. But also I think it’s important not to get too much deference to that point of view because I think, A, challenging our readers can be good and we don’t want to pannedder necessarily to the lowest common denominator and I think people in the newsroom can be too cautious about going too far. So it’s important to bring people into these projects that can provide the laymen’s perspective while, understandably not being too deferential in them.
STEVE: I think that’s a good point and I think I’ve gotten that with a lot of my collaborations has been with with our columnists that are like, “Shut the fuck up! Just tell us the story!” But that humbling moment could be ultimately productive in certain situations.
AUDIENCE: I was just gonna say that make sure your stories are about people and not just about numbers.
ERIN: Totally true, and actually if you guys have any suggestions about how to do that, that would be good. I don’t know how much time we’ll have to talk about that.
AUDIENCE: I guess I have a little different take on the rural. I work in Oregon and I work in Portland in our paper, basically, at this point in time deals with our metro area, but Oregon is a big state and there’s a lot of rural towns in the Pacific Northwest and Alaska and everywhere, so forget that. Just to respect the people that live in the rural communities I think is something that we forget to do. I don’t know if you guys read the story of the teenage boy who — the whaling story out of Anchorage. Read it, I think every journalist needs to read it. It was about a boy who killed a whale for sustenance, and it just went viral, and he got death threats just because that part of the country is just not aware of the power of mowed. And I just think that that’s something that we need to deal with, and then just be aware when we do our campaign stuff, I always am assigned my counties. I know my rural counties are going to be a little bit later. So I call ahead of time and say, hey, I’m in charge of your counties — are you guys, do you guys have a website? Do you put it on a PDF? And just to be aware that they have limitations and to respect the fact that they have limitations, I think, can make it a little bit easier for us.
STEVE: Yeah, I mean, I think the previous point is a good lead-in to that and, you know, making sure that we, as reporters remember that, you know, ultimately when we’re writing about data, we’re writing about people, and connecting their stories back to that and it’s not just about numbers. And I think that is — the my most successful projects are the ones that are grounded in that real team story that have a data backbone to sort of swing the hammer over it. But ultimately people connect to people, and, you know, that — that — that, I’ve found, helps people connect people to stories, and helps to connect to — whoever you’re writing about in whatever case. You’ve established that connection and it goes a long way.
ERIN: Since I’m just curious. How many of you guys have newsrooms located in rural areas? How many of you guys had at least part of your coverage area in a rural area? Yeah. I think that there’s also kind of a risk that are located in suburban or urban newsrooms to kind of like look down at — well, not necessarily look down, but see them as the other, but not have the cultural awareness that comes with a rural county and I guess that’s true with any other culture, minority-majority area or something like that. So that’s something that we have to be sensitive about.
AUDIENCE: So this gets a little bit back to the earlier point about helping people understand visualizations or analyses that we may have done that might be kind of complicated. You know, on the Internet they always say, don’t read the comments. I actually really like reading the comments on the data stories that we write because we’re responding to feedback, like, on social media where, you know, it’s like, this actually happened last week that we get some feedback about a little data thing that we did, and I think for a lot of reporters in our newsroom, the instinct is to just ignore it, they don’t want to even look at the comments because they’re afraid of what they’ll find. But I’m really encouraging engaging with those people because it builds trust in the data, it builds trust in our ability to do analyses, and it helps us answer questions about, you know, why we built — like map a certain way or something like that. So it gets back to your point about the building a human connection. I think it also helps to remind people that we’re people, too, on the sidelines and we’re just trying our best to get the information out there so that’s one thing that I recommend.
ERIN: Side note, so one of the common questions I get is so what are the statistics in my town? If I cover something blah, blah, blah, a particular area has blah, blah, blah statistics. But people are curious about their own.
STEVE: Just in piggybacking off of the comments thing, I really enjoy engaging with commenters, not all the time —
AUDIENCE: Not all the time —
STEVE: But just, Chris, was it you that has made it a practice of engaging with commenters on websites that you don’t generally read?
AUDIENCE: I have tweeted about that. Not necessarily engaging but I try to make it a habit of reading comments on websites that I often disagree with because the comments are more or less the zeitgeist of the article. In particular, if you ever read the National Review, the comments almost uniformly disagree with the substance of the article which tells you more about constructivism than the National Review does.
CARLA: All right. So we’re going to move on to the next question.
AUDIENCE: I’ll just be real quick. One of the interesting things that we’ve found, I’m based out of Salt Lake City and we have lots of rural areas around where we’re at. And sometimes we look at it kind of like this negative thing, ugh, we have to support the rural communities but, for example, like high school sports coverage will always be the small, tiny schools and we’ll stream like the strategies and games, and things like that, and we’ll have this tiny, small, one way, two way chat that will have hundreds and hundreds of people on it because they couldn’t drive three and a half, four hours to drive to a game. So if you find things that they want, and they need, we’ve found things that our rural communities can actually be lawyers readers and engagers of content.
CARLA: Awesome. All right. Next question. Can we team up with national organizations? What are the challenges in localizing national data stories? Steve, do you want to take this one?
STEVE: Sure. So on the first part of that question, I mentioned data.world earlier, something that we were experimenting with. We partnered with the Associated Press with a pilot program that they’re running where they’ve started — for people that have some sort of subscription with them, disseminating data that they work with on national stories to their — to the member newsrooms to work with them on a local level. I found it to be pretty effective thus far as a means of collaboration. You know, just as — because, you know, they try to work on their own end in this regard, we’re working with them already, and they have these partnerships in place already, these newsrooms. And so they just started publishing the data, you know, in a collaborative manner on data.world and that happens to be what they’re using. And we’ve actually — and we’ve had some great localized stories out of it that they wouldn’t otherwise have gone down into — to do the detail leveled work. So it’s just an example of how we’ve been successful at collaborating with a national organization before.
AUDIENCE: I bring up a problem and somebody else can figure out what the solution is. But one problem is so, you know, there’s a national news organization that is offering up data and stories, and then our editors will say, well, we haven’t looked at that data, or we haven’t — or how do we trust it, or how do we — so if anybody has any ideas about how to answer that.
STEVE: We fill in that all the time! Um, you know, I’ve found that, you know, a lot of our editors in the newsroom are pretty entrenched in the sort of proprietary nature of the industry, or the more competitive nature of the industry which doesn’t, hasn’t really existed in the same way in my career. But — so it’s been a slow battle but just, you know, going through with them when we had a situation where it would be like, hey, we have this opportunity and that goes with other collaborations that, you know, that we’ve been involved with, whether it’s with ProPublica, and stuff like that. But some of them have been super frustrating but, you know, giving the best, most detailed — I usually send you tell a letter just to the people involved and saying, look, this is really great. This is work that has already been done for us, and, you know, it’s something that we can build off of. It’s something that we can get — I’ve vetted the data. This is why it’s good. This is why this analysis is sound. Here’s the methodology, and just presenting a clear case as you can. It doesn’t always work but I’ve found that being persistent with that approach and that, you know, encouraging the sort of collaboration and seeing it become more common has started to pay to results in that regard for us. Still not total but we’re getting there.
AUDIENCE: Just quickly, a side note, I think one of the ways that mobile, that big national media outlets could be helpful for local media outlets just as examples, not just looking at projects and saying, hey, let’s do that, that’s a thing. But when I was fighting with them trying to improve open sourcing for code, and put it on a GitHub page, and make it public. The best argument that I’ve had concerned maybe our competitor will get over us in an advantage or et cetera. But I’ll say, the New York Times, the Washington Post, and the Wall Street Journal have all done this stuff so why can’t we. And the power of that example from the acknowledgment of industry leaders has been very powerful to overcome the inertia of tech-skeptical local news editors.
AUDIENCE: I was just gonna say one way for national publications or that data driven that want local news organizations to pick it up, I would say being in communication as early on as possible would make the lower ranking reporters that might really understand the data, or might be really interested in the story because if you can get buy-in early on, they can convince their editors to run with it.
ERIN: Sure. And this is kind of the place where those discussions totally happen.
STEVE: To — as a jumpstart for the second part of this question, one of the things that I’ve sort of found as my career has progressed is, you know, the challenge of localizing local data stories for me has often been fear. And just like I don’t think I can do that. And more often than not, I think state and regional publications are often better positioned to do these stories on a local level and I’ve had a couple of examples of that. You know, recently like many of you might have seen the Times did this really big, great story where they projected the deaths for 2016 back in June and it’s skyrocketing across the United States. You know, I don’t — we don’t have the resources to leverage that kind of work. They polled every, you know, state in the country and, you know, extrapolated databases on, you know, what they were able to find. But I was able to do that with New Jersey, you know, for our coverage area. And found that, you know, they projected New Jersey was going to have 1,500 opioid deaths in 2016. And it turns out when you get down, and we had the resources to do that, it turned out to be actually 213, or 205 deaths. So but maybe back then, I wouldn’t have pursued those things. But using those things you can’t replicate them exactly, but using them as a launching pad for your own stuff, I don’t think anybody should ever.
AUDIENCE: I’ve got a question. Something I’ve thought about is whether Code for America or other data agencies would be partnered with local news organizations. I’m curious if anyone’s done that, or has experiences.
AUDIENCE: We’ve — Pioneer Press — I think you set it up, a state, local, government salary database that we maintain. And with various cutbacks in the newsroom that we have had, we haven’t had anyone to keep updating this database. We kept filing FOIAs and getting information. And so we shut down, and closed down the site because it was out of date and all that. And then we immediately got a flood of questions and complaints from readers saying, where is the salary data based on? And that’s what we realized, how popular it was, and it turned out that there was a local data group, and we ended up programming with them and they took on a lot of the gruntwork that we had been doing. We’ve put up some work that we ordinarily wouldn’t have done. And then they came, and we were hosting it. And we were able to get back to where we were, despite the fact that we didn’t have the manpower to do what we were doing before.
AUDIENCE: You didn’t have the manpower to track the popularity before?
AUDIENCE: I don’t know. It wasn’t my project.
AUDIENCE: So I’m a founder of a Code for America Miami. I can only speak for Miami but in our experience it’s been more of a challenge in that but only because I don’t think the techies that are trying to come in and learn about these concepts don’t necessarily understand the proper data stories. And then, the journalists don’t necessarily know that we exist or they’re not quite sure what — how we can help per se. So I think that said, I feel there’s always kind of projects where we’re trying to build things, and that would be useful. For example, like we take Florida — State of Florida inspection data, which is, you know, in CSV format and we do RESTful APIs from it. So that’s been our challenge. It’s not that we don’t want to help and this is, of course, me speaking as a brigade captain. But more than that, it hasn’t been — like the right synergy hasn’t happened and I don’t know what that takes.
STEVE: I think similarly to, you know, the nature of collaborations in general with national organizations and stuff like that, I think it’s reticent on behalf of a lot of newsroom leadership. I think it takes time because one of the things that we’ve been trying to do is get out of the ground a New Jersey version of the California Data Coalition, which has been super successful but I think that we’re going to have to — there are any number of organizations that are willing to host this data but, you know, getting our editors to sort of — and our leadership to sort of be able to make that connection was a harder sell than I thought but like I said, it’s not impossible. As was rightfully pointed out, it’s still very new. It might take some convincing.
AUDIENCE: As someone from a local news organization, I have a big request for kind of from national organizations and that’s you can open source data scripting tools, and data cleaning tools as much as possible because if you’re building a tool that scrapes data from 15 different states and import them into different systems, that could be useful to newsrooms in 15 different states. I’ve definitely worked with tools that were developed for bigger newsrooms and I would love to do more of that
ERIN: Yeah, it’s more easier to sell my editor on hey, I’ve based this on code that happens to be from a national institution, versus, I have this data from a national organization that they’ve neglected somehow. I’m not certain of the methodology, how they got it.
AUDIENCE: So something about these collaborations from higher-up editors above me is: Well, we tried that once and it was a complete disaster and they were left up to trying it again. And I think that’s the biggest challenge that you might have to face. The other thing that we have to fight here in Minnesota is the trend on the national story is different here. And so we don’t quite have the same story, and then it becomes hard. I have started using the Associated Press’s data.world. And one of the things that I’m like I said in there is you can quickly visualize the data and you can see where your state sits and quickly see, is this going to be worthy for us or not rather than having to download all the data, and rather than doing some analysis and figure out, are we in the middle of the pack like we always are, or do we spin that one way or another, and be able to pitch it to the editor. So I’m kind of hopeful that maybe that platform might be more enticing for at least some of that. And anything — someone else pointed out about getting some of that advanced notice — AP doesn’t give you a ton of advance notice. I would love to get a notice saying, we’re working on blah, blah, blah, it’s coming down the pipe, heads up. Because we often don’t have time to turn it when they’re going to publish it, and so we’re later or something. But the other thing I would like to see more of would actually be regional collaborations. I think that would actually be more effective.
You know, we have state school taskforce coming out on Monday, and every year, I sit and think about my colleagues that output papers around different parts of Minnesota where I used to work, and they don’t have the manpower, probably, to even pull out their schools, or their districts and do any kind of basic analysis, maybe, I don’t know. Quite honestly… you know, is there some way to make that — help, you know, if I’m already doing it, is there some way that I could help them provide some piece of it to them? I also wondered if that was a revenue source for some of us. But, you know, helping each other out. So even some simple things like that are things like that that we can do, where we’re not directly competing with each other, really. I mean, in Minneapolis, we’re not competing with the state a lot of the times, truly, we’re not. That’s my thought.
CARLA: Preach, preach.
AUDIENCE: Has anybody else seen this kind of collaboration in the rest of the country?
STEVE: I think for us, at the local level, that is where we see some of the major competition still entrenched even if it doesn’t naturally exist in any kind of substantive way. You know, and I don’t think I would ever dream of, you know, pitching a collaboration with a Burning Record because that’s our arch rival and even if I talk to all their reporters, and their data reporter, and we inform and collaborate behind the scenes and stuff like that, you know, it’s a tough sell. It’s crazy because, you know, realistically, we’re fighting the same fight and we’re both producing our own stories with the same data. And we’re going to mine different things ultimately most of the time. But it’s tough.
AUDIENCE: Can I just comment on that? I think at the end of the day we’re actually really going against platforms like Facebook and Twitter which are really brand agnostic, and I think even with email newsletters, people don’t care about these which are internal, and if we continue to battle for the same story, et cetera, I think we’re generally losing an audience that’s bigger and I think is more interested in a media company that’s more than just them and that’s, regardless a resource. So I want to hear how collaborations, it just feels like now is really the time to start working together because of all those other questions we just answered, right?
CARLA: All right. We have one more — we have one more question but it’s honestly not that important. Like, let’s just keep on, and also, if you still have more thoughts on other questions from before, like, please feel free to, like, change the subject and go back to something that that we talked about and have more thoughts on it. We have ten minutes to fill, so this is free time. So, yay!
AUDIENCE: I just wanted to share a collaboration idea. So I’m in Spokane, so we’re one of the four or five papers in Washington that are reasonably large but not the Seattle Times and I’ve been kind of working with people at other papers to have some kind of shared datasets and stuff that we get from the state that they charge for. So it’s really good health department data but it’s $50 per dataset for year which is not too bad but enough that it makes sense to not buy it, and the Seattle Times, just halves that stuff, and it’s like, why don’t we report it, so I have editors that have been just having each paper by one set and have been putting it all together — I mean, it’s kind of — I don’t know if people are doing stuff with it but I am. So…
AUDIENCE: I’m at a tiny paper in Iowa and we’ve actually had some — not us — but in the state we have a couple of non-profits, the Iowa Freedom of Information Council, and there’s a Wash Journalism group where they will say, hey, we’re working on a story about this if any newspapers want to collaborate with us. And they create a story and each publication leads their own story that does it for their community, or town, and it’s been really successful.
AUDIENCE: So I wanted to build quickly on what you were talking about not to build on national stories, I can’t count the number of stories even with the short term that we’ve been doing the project that I think it would be easy to localize and realizing that the local story isn’t an obvious subset of the national story. And so our primary constraint is reporting resources because we started volunteering our free time and one thing that I argue for is when you are putting out a national dataset, and putting out something that’s local level stuff, look at the nuance of that, and you can see if there are regional, or subregional stories that fall out of that. It certainly makes it easier for other agencies to localize it.
AUDIENCE: Sort of related but so for previously, the question about local governments, one thing that you can do is FOIA for retention schedules so a FOIA officer can’t say we don’t have that for a database. And know what data is really, really helpful. One of my colleagues. And as far as localization goes, if you are national, it can produce data direct effects and stuff like that, make it amenable to share them. It’s better than to do that than if somebody screenshots your stuff. Localize with GeoIT if you can. One thing that we’ve found with NPR with national stories is we’ve done some A/B tests and when we zoomed in on each state to show how much money people were spending or how much money was being spent per pupil in the school district, people were more likely to finish the story when we localized the graphic. So not only did they spend more time interacting with the graphic, they were more engaged with the content as a whole kind of going back to what about my town sort of thing and was sort of a great example of high-availability resource is amazing.
AUDIENCE: I think one thing that I’d like to see is being out in Salt Lake City, a lot of times we’ll see some cool Washington Post, or other national organizations and I say, yeah,, but you have to realize how many people are working on that. But it’s really hard to be aware of the really cool things that other regional peoples are doing. It’s like, it doesn’t happen very often. So it would be awesome if there’s a way to be aware of what the other regional papers are doing and knowing that, hey, we have very similar constraints to what we have, and kind of have the examples and going with collaborations with regional papers to say, oh, because if we’re on a call with the Washington Post saying, hey, I want to do this cool project. It’s like, okay, it’s not like if you do that… we can talk to another paper and say, hey, let’s find a way that we can try and find the same projects, and find ways that it’s localized and you can run the Salt Lake City version of it and you can run the Minneapolis version of it.
AUDIENCE: Can I give you a suggestion, IOE has the plot — all you have to do to get your story on there is send them an email. That is it. And it will go on. So send them their stories and they’ll be able to see all those regional stories. Because now, even if they go and find them, they will find the big boys. So if you send an email, it will get more of the smaller stuff in there.
STEVE: Another thing about the data, is totally a thing to envy and stuff.
ERIN: And with the open source sources letter.
AUDIENCE: I think it’s called made.
STEVE: And just to piggyback on the stuff that I talked about earlier, I don’t want to seem down on collaboration because it seems frustrating. I think it would all serve us to keep trying because the one thing that I love about this sort of type of journalism is the sense of community, that sort of collaboration is what drew me to it in the first place, so I think we’re perhaps better positioned than other people to evangelize that and to continue that, so I would encourage you all to bridge that gap if it still exists.
CARLA: We have three minutes left so…
ERIN: You want to move on to the last question, real quick?
CARLA: Um, I mean.
ERIN: I don’t know if we have enough time. What’s the worst format that you’ve ever received data in?
CARLA: Well, that’s not the question but okay.
STEVE: I had a fun one. A week before class, one of our reporters came to me and she had requested a table in a spreadsheet that was just weird wording that she put into it, and so, the open records officer pasted the HTML of a table into a Microsoft Excel — into one cell of Microsoft Excel…
[ Laughter ]
… and technically if you would have her request.
AUDIENCE: Wait, this happened to you, too?
AUDIENCE: I’m pretty sure most of you have very difficult dealt with this kind of situation. With Tucson Police Department, they took a database and put them into data tables, printed them out onto thousands of pages of paper, and then scanned them just to piss me off and sent them in PDF formats.
STEVE: That’s like a local government special right there!
ERIN: You had the Cobol, right?
AUDIENCE: Other than the Cobol mainframe, I received a spreadsheet of images instead of values. Each cell contained an image of text.
AUDIENCE: How does…? That even…?!
AUDIENCE: You have to work to do that.
AUDIENCE: That’s beautiful.
AUDIENCE: That is trolling! A breathtaking degree.
AUDIENCE: I got a machine readable technical format, of the boxes that I was interested in, the particular way they laid it out for print made it almost impossible to do anything with it without hours and hours of reformatting. So they had multiple years of data for some of the instances and they separated in the same cell, or put the parentheses in the same cell, or updates, and then, of course, the headers were all misjoined cells, and split cells, and one header was on three different cells in a row. Everything that they could have done to make a pure-text spreadsheet to make it difficult programmatically, they had done on that spreadsheet.
ERIN: That’s when you have data manipulation.
AUDIENCE: So we received solitary confinement roster data from the Minnesota Department of Corrections as that that was basically, like, every single instance was somebody being put into solitary confinement was all chalked up to be overlapping data ranges and, like, a data ranges inside of other data ranges and sometimes they were consecutive and they were all out of order and we had to write a lot of different algorithms just to collapse them into single date ranges so you could figure out who was in solitary confinement for how long because otherwise it was completely impossible to find out and we found out that some people had been in solitary confinement — like, one guy was in there for 5,000 days but you would never figure it out looking at all these different entries for it that were all two degrees of this.
AUDIENCE: Isn’t there like leading spaces in the cells, too?
AUDIENCE: Yeah, there were leading space and name inconsistencies. It would been taken essentially from written records and, like, different people were, like, putting it into database over time, and so they all had different ways of doing it.
AUDIENCE: This isn’t really a data example but it’s a funny example. So Texas just made a lot of the information about the death penalty and the drugs they use to execute people private so you can’t get any information basically about them, how they’re conducted and so I asked for a bunch of information about the death penalty and it’s conducted. And so one of the things that the Texas Department of Criminal Justice sent me is an exhibit that was a letter, that was 31 pages long, a PDF file they sent me. The first pages said E, and then the remaining pages were blacked out. It was awesome.
CARLA: So I guess you’re technically free to go because it’s 4:17 but we can keep going if you want to stay.
[ Applause ]