AI Tech in Journalism-Episode 1
In my first episode, I’m joined by Michael, who has worked for over 10 years in the Tech industry. He graduated with a Bachelor’s degree in Astrophysics and an MBA. He has worked with different iterations of predictive language technology and has some great insight into the future of AI tech and its potential impacts on media and the journalism industry.
Alayna: [00:00:00] All right, so, my name is Alayna. I don't have a name for this yet, just AI and journalism and what it means for the media industry. I am joined by Sonya, another fellow RSJ grad student, and Michael. Michael, can you give me some brief, information and background about [yourself]?
Michael: Yeah. Yeah. So I was born and raised, in the Bay Area. I have worked at technology companies my entire career, uh, ranging from really small startups of like 10 people working in a basement or a garage to giant multinational companies that anyone listening to this has probably heard of.
But I've also specifically worked on a lot of AI chatbots, both in my free time and for work. I've also built some kind of backend services that chatbots would call and, you know, do things with. [00:01:00] So I'm pretty well versed in space, at least the old world.
Alayna: So I guess, can you briefly explain kind of how it works? Is it just kind of doing the auto-type thing, or you know, is it actually able to process your questions…
Michael: Yeah. So fundamentally like machine learning at the end of the day now, now nuts and bolts, like there's a lot of algorithms and a lot of math that's done and it's all really complicated. And to be honest, the individual algorithms that are used don't really matter and everyone sort of uses a different cocktail of them, if you will. But at the end of the day, the whole. The whole like prime function of AI or machine learning is, to learn patterns and to identify those patterns. So pretty much every AI or machine learning tool that exists is given, you know, a set of [00:02:00] data and a set of parameters to look for in that data. And it's given validation as to whether or not it got it right or wrong. And so, like fundamentally, pretty much any machine learning thing from the kind of the old school, like chatbots, are just kind of called in response to the newer like Chat GPT generative stuff. At the end of the day, that's all they're doing.
Alayna: Is the information that we put into it, you know, as we go... Because one of the things about Chat G P T is it says it only has information up until September 2021. But is that necessarily true anymore? Since we're putting in so much information, is it able to retain that?
Michael: Probably what that means is it only has training data up until then. So like when you, so right. You build a set of data that describes what you're looking for. And you also typically, if you're doing it right, also give it a bunch of data, things you're not looking for.
And a bunch of edge cases cuz there's always [00:03:00] gray areas. And then you just dump a bunch of data into it mm-hmm. And watch it, like, chew on that and you validate what comes out. What I'm guessing is that the initial training data only goes to 2021. Like they've only indexed human knowledge, whatever they define as human knowledge up through 2021.
And then, but you're right. And that everything that someone says to it trains it further. Yeah.
Alayna: And that's, I guess where the people who've been trying to stage jailbreaks or whatever, or get it to, break its code is coming from too. Right.
Michael: Yeah. I mean, like, that's it's hard to do that.
I mean, so like, I guess it depends on what you mean by jailbreak. If you mean like, make it say naughty things, that’s easy to do. But you need a lot of people to do it. Like there's that typical example [00:04:00] of Tay AI from Microsoft that had that problem. And that was just because it was, you know, the people on the internet who got to it were like, you're four chans, Reddit user sort of folks. And they just completely polluted the validation step that we talked about. But like, As far as exploiting it or breaking it or making it sentient or anything like that, that's kind of impossible.
Alayna: Um, did you hear about Dan?
Michael: No, I've not heard about Dan.
Alayna: So Dan was something where a bunch of hackers was able to make it, create a second personality that was breaking code or whatever.
Michael: Yeah, I mean it's hard to break machine learning stuff because like, at the end of the day, all it is is it's learning whatever you throw into it and validate that comes out.
So [00:05:00] as you can, as I said, teach it to do naughty things. But you can't necessarily break it. You can break chunks of it. Like I definitely, as part of battle-testing different AIs, I would literally throw like a megabyte of lorem ipsum, like random text at it, and just watch it completely choke.
But that was more of testing the network and not really testing the bot itself.
Alayna: So has the AI technology actually made this huge leap forward, or is it more just with the introduction of Chat GPT? We've become more aware of it as a society.
Michael: It's both. So like the core, I'm going to guess, I don't actually know how, like I'm going to guess that a lot of the core functionality isn't different. What they're doing though [00:06:00] is they're diving a level deeper into understanding language. So like in the old days, and actually you can do this now, it is free, there are free tools that you can do this, where you can make your own chatbots. And what you do is you basically tell it, okay, I'm looking for this input.
And you set a list of outputs and potential outputs. And it'll basically pick one at random. And some of the fancier ones have a kind of fuzzy logic to take synonyms or a badly structured sentence that has the same idea. But you still have to hardcode both the call and the response.
Um, now I think these tools are able to look at the structure of language. And start understanding things at a deeper level and know that, okay, this is a noun, this is a verb, here's how they're strung together in English. Here are these concepts that are sitting in my database and that I can leverage.
And that's [00:07:00] the new thing. And I think like, how it seems revolutionary is that we have a pretty clear picture in our heads of what a computer's supposed to do. And what it's good at, right? It's good at math, it's good at repeatable tasks, that sort of thing. But it is like generating information and giving it to you feels very novel. Even if, at the end of the day, the nuts and bolts are more evolutionary than revolutionary.
Alayna: So do you see this having staying power and just getting better and better or is it gonna kind of just be a passing trend?
Michael: I think it's too early to tell. I do think that at the end of the day, different products are going to, and different companies and products are going to continue leveraging this.
I don't know if it's going to revolutionize or create industries, but it'll be a part of, [00:08:00] I think, products going forward. Similar to chatbots, right? Like, I mean, 10, like what, 10, 15 years ago? You had a phone tree and that was the only automation you had on customer support stuff.
And now it's, it's almost like it's a novel to talk to a person. Like you hope that when you're texting a customer support person, you're talking to a real person and half the time you're not. I think that that's gonna be closer to how things end up going. It won't be like, “Oh my god, pps are done, and search engines are done. It's gonna be more like this is something that is a tool.” But part of a bigger thing.
Alayna: So speaking of search engines, you know how Google operates on the function of being able to produce multiple results to people and rank those best results and charging companies to rank [00:09:00] those best, those results so that there's more of that diversity in the search engine. So how is this gonna change SEO and search engine marketing?
Michael: I mean, so the thing is like how it'll change seo. It'll probably make it easier for companies to generate the [search] terms and phrases.
Like hypothetically, you could imagine throwing into a model, you know, I am advertising a plumbing company. Give me all the keywords that are related to plumbing companies. And they'll just spit out a list. But the problem is because the tool becomes more and more ubiquitous, more and more people will use the tool.
Therefore, whatever advantage the first people have when they use that tool is gonna be completely evaporated. As far as SEO and marketing go, I [00:10:00] think at the end of the day, we're creatures of novelty. We want new experiences, we want things worded or structured in a new way.
And so if everyone's using the same set of tools to generate marketing content or SEO content or whatever, like it's not gonna, you know, grab us and so it's not gonna be effective as advertising.
Alayna: Yeah. But what about from the customer's standpoint? Because you know, right now people can put into Chat GPT, “What is the best car for this need, this need, this need?” And then it's like producing [results like], “Based on your information, here's the one best car for you.”
Michael: I mean, so there are a few things, right? Like when you Google something or Bing something or AltaVista something, uh, you end up with, no, it's not just the results, right? It's like the [00:11:00] results are ranked a certain way. There are multiple perspectives that are given. There's, you know, multiple different kinds of data, like whether it's images, videos, or what have you. Like, I can only think of a few information journeys where I just want a bot to respond to me with a single answer.
Like, how tall is the Eiffel Tower, I just want a number. That's all I care about. But like if I'm saying what's the best car for me, I'm not gonna trust if any individual person tells me. You know, no matter how well they've known me, what the best car is for me, I'm gonna want five or six different things. I'm gonna wanna compare, I'm gonna wanna manipulate them, I'm gonna wanna look at the sources. I'm gonna wanna see the local listings in my area. And that is very hard, right? To make a tool that will just give you all of that really simply. That's why search engines exist. That's why there's, you know, a fairly rich [00:12:00] landscape of them.
This is because people's information needs vary, Depending upon what they're trying to do.
Alayna: Yeah. So you don't think that search engines and you know, Google search would become obsolete with this?
Michael: I don't think so. I mean, I think that there'll be certain journeys that may get thrown into generative AI, but like there's a whole lot of stuff that would need to be figured out that a prospective generative AI tool would need, to help figure out.
Alayna: Yeah. , what kind of issues do you perceive as having on the journalism industry, or not even really issues, what kind of effects both positive and negative, does the chatbot technology have on journalism?
Michael: So I think it'll drop the floor and kind of raise the ceiling a little bit. So I'd think about [00:13:00] clickbait sites. Right? Like those are all basically, you know, either done by people who are paid next to nothing to produce a whole bunch of them, or they're made by some sort of automation or scrape or something like that.
I think that generative AI, once it becomes something that a consumer can kind of deploy and use regularly like that's gonna be what ends up getting used, replaced will be something like an AI generating a bunch of clickbait stuff or generating listicles kind of low-quality content. Because, at the end of the day, similar to advertising, people are gonna get bored with the generated stuff. Yeah. The stuff that's really gonna click as something that an actual human has thought through. And it has an interesting take because the chatbot is still reliant on human creativity.
Um, to ask it the right question and, and try to massage stuff the right way. And at the end of the day, [00:14:00] is like there's a limit to how much it can be created by itself.
Alayna: Yeah. That's interesting. We, uh, we have another person in here, my classmate Kingkini, and I talked about that. Like, I definitely see it easily creating this clickbait, you know, single perspective kind of article, but then people will have to kind of realize that that is what they're consuming and not necessarily hard-hitting factual journalism.
Michael: Well, and I think that that, yeah, so that, that leads to like raising the ceiling a bit, which can have a lot of downsides as far as people getting into the industries.
I think that it's going to basically create a much more crowded space. And so, It'll probably lead to an intensification of the same problems journalism is having now, where there's just way too much content and people are not getting paid enough and are expected to do kind of crazy hours and work in kind of a [00:15:00] multimedia environment where they're tweeting and they're posting on Instagram, and then they sometimes have time to write stories like that problem's gonna get worse because now the creativity bar's gonna be higher. The field is gonna be so much more saturated.
Alayna: Yeah. And so some of the flaws right now with AI technology is the production of misinformation, which is partially a user problem, because if we put in a biased question, it'll give us a biased answer confirming that. Is that necessarily a flaw in the system that can be fixed or is that just something that people will always have to take with a grain of salt?
Michael: So there's, there's like two kinds of big problems. Um, or two classes of problems. There are a whole bunch of problems, but there are two classes of problems that lead to either misinformation or bias, right? One is that where users are [00:16:00] just acting in bad faith and I think that'll be on the onus of the developers, of the tool to figure out, okay, “What's a question that's given in good faith? What's a question that's not, what's a question that can lead to kind of more fringy or misinformation-type answers?” And basically put a ban, like a band hammer on those right from the get-go. Have the chatbot say, “Oh, I can't answer that.” Or, you know, “Go talk to an expert.”
Here's a thing. Right? Like, you can imagine something similar to, what YouTube does now with attribution where, if you're watching a video, on climate skepticism. Mm-hmm. There's normally a little bar that comes up that says, “Oh, you know, according to all of these scientists, climate change is real. Here's a Wikipedia article.” You might have something like that.
Alayna: Google kind of does the same thing, right? If. Something is being put into the search bar that's, you know, more likely to produce, [00:17:00] negative search results, Google kind of puts a little bit of a filter on that. So would the AI bots also put that kind of filter?
Michael: Yeah, and I think that would be what you'd have to do. It's an interesting problem cuz you also want to disincentivize the behavior. Mm-hmm. You don't want to just lead someone down an authoritative path.
You almost want to teach them that they're asking the wrong questions. Um, and like, then you get into interesting things like figuring out, okay, are they some trolling person who is just, trying to ask something, inflammatory? Or are they legitimately like someone who has real questions but maybe misinformed?
So that could be very interesting as far as how you treat those different classes of people. The other big problem, which plagues chat. Even the more basic ones that I used to work on were just biases in the training data. I mean, unfortunately, a lot of human knowledge, at least the stuff that ends [00:18:00] up getting thrown into technology, and gets indexed and all of that, were written by white males.
All of those biases, all the historical biases end up in the training data. And that's actually something that a lot of companies do care a lot about, you know. AI, machine learning, fairness, and basically kind of de-biasing all the training data. There's another problem when you start indexing internet data the loud stuff is not always the most correct stuff.
In fact, there seems to be a countervailing correlation where the louder something is the less correct It tends to be.
Alayna: Yeah. So what are some of the flaws present in Chat GPT, and other chatbot technology, and what will be needed to fix that? How long do you think that would, what would it take?
Michael: I mean, I have, I have no idea. I'm not sitting on the other side of that. I have no idea [00:19:00] what the main list of things they need to fix is. I mean, as someone who's used them, a lot of it just comes down to bad data, and like a lot of machine learning projects end up sort of being that same flavor.
There just isn't enough data. It's not robust, they haven't given enough both positive and negative cases to train it well. And there's also just a lot of pieces of information that just aren't there. A human being like, you can ask me a completely ridiculous question and I can make up something.
You can't actually do that, even with something that seems kind of humanistic, you can't ask it, you can't go at it with aphasia, like with an aphasia question, “You know, potato, bridge, twinkle,” and expect it to spit out something reasonable. I think there's still just a lot of obscure stuff kind of in that vein also.
It isn't modeled and isn't in there. [00:20:00] And then it just becomes just a matter of time and attrition to like get as much stuff in there as possible.
Alayna: Yeah. What kind of advances do you see it making, and where do you see this technology going in the next, uh, six months year?
Michael: I mean, the thing is like in a lot like it's, it's similar to blockchain, which I have thoughts about blockchain. But similar to the blockchain as a technology, it really just depends upon what people use it for and how investors and thought leaders, and the media respond to how they use it, especially in tech, right?
The kind of capitalism that we exist in has a very itchy trigger finger and is quite twitchy. We'll just go after the shiniest thing that's getting the most buzz. And so it's kind of hard to pin down where the technology is gonna go. [00:21:00] Also technological innovation is quite chaotic.
If you look at the history of various innovations, microwave ovens were invented by a radar technician whose chocolate bar melted as he got close to a radar antenna. And so like the first microwave ovens were designed and produced by defense contractors, which is crazy.
Like, just kind of think about it from a logical perspective. No intelligent, super-intelligent being who was designing a civilization would have that path for innovation. So honestly, it's really hard to tell. I do think that what I said before about journalism applies. I think that it will be used for a lot of things like, you know, mass, mass, mass, mass, low-quality media.
But as far as doing good things, I am not a hundred percent sure where it's gonna go. I think one thing that's interesting to me at least, is the procedural generation of things like game content. There's [00:22:00] a lot of literature that goes into video games and even movies and such that, that sort of filler.
Right. That's conversational. It's just to get in, it's just to add color, or to get a basic piece of information across. You can do that. Right. And it'll feel right, and it'll save writers, you know, hours upon hours of time writing all of that stuff.
Alayna: Yes. So ultimately, rather than making human writers obsolete, it'll make bad writers obsolete. And you think it'll push the higher quality.
Michael: Yeah, well, not even like it will do that a little bit, but I think it'll give writers, I mean, this is a good thing and a bad thing. It'll give writers kind of a higher throughput. Because, instead of saying, having to write all, you know, 500,000 words of a book, you could theoretically, [00:23:00] have some generative thing, spit out a basic template of a genre and then you can start filling in the details, and really make it your own.
Or, you know, something like a movie script or, you know, video game writing. You can kind of have all the filler and conversational stuff kind of handled and then tweak it or remove it, and like that, that'll save some amount of time. Mm-hmm. Which will allow them, to do more stuff or expectations will be higher on them to produce more stuff.
Yeah. Because they'll have these tools to automate their job. Yeah.
Sonya: I have a question. Michael, you look like someone who maybe plays Dungeons and Dragons.
Michael: What slanderous lies.
Sonya: I was just wondering if you've ever used any AI to generate stories for games like that? Cause you did mention games.
Michael: No, that's a good question. I have not, but I know people who have. I definitely know people who have used, [00:24:00] writing generators for D and D campaigns and for story prompts.
Alayna: Oh, that's cool.
Michael: Yeah. So story prompts is another good one, but I don't know what that would do to any creative industry but prompts for sure.
Alayna: I'm actually in marketing and we've been using it for re-optimizing our content. Not necessarily writing anything because it still writes like a fifth grader with, the standard five-paragraph essay format. But we have been using it to create outlines because we were kind of on the fence about if we want to have it write. And then we go back and edit and put that human spin on it. Or do we want it to create the outline and foundation of our articles and then we write it from our perspective? And yeah, so we've been using it for that. I wonder if in [00:25:00] the marketing industry, uh, or, digital content marketing, if that's where we're gonna see companies be like, “Well, why would I have a human, you know, write this content when I can have a robot do it for me?”
Michael: Probably, I do think then you would. I think then at that point, you would have copywriters put out work, but I don't know if you would necessarily still have to have someone come up with a creative vision.
It would just be instead of a team of six copywriters turning that vision into a campaign, you would have three and some tools. But then again, the ceiling is also higher because the people who are then there need to figure out, how do I, parsimoniously, get the idea of this creative ad campaign out of my head and into the tool, and how would I handle all the edge cases and stuff like that.
Alayna: So how do you use this technology in your own [00:26:00] professional or personal life?
Michael: Uh, no, I just play with it right now. I ask it stupid questions like,” I am a banana sandwich. What is my life like?” And crazy, stupid shit like that.
Alayna: So, I think that that is all the questions I have. Anything last minute to add?
Michael: I guess I would say as we're coming out of like the N F T blockchain, crypto, you know, boom, and even like the share-economy is dying down and this is kind of the new hotness. I don't know. I would not be too concerned and also not too elated. Because I think this is just another thing that's kind of [00:27:00] getting put on the pile of technology. And just, I don't know, it'll do a lot of good things. It'll probably do some bad things that will have economic impacts, but don't be too stressed about it.
Yeah. I think this is the right way of putting it. Don't be too stressed about it. Either way. Don't go running to invest in every new something that has ‘generative’ in the name.
Alayna: I mean, as long as it won't turn into like Eagle eye. I think that's the one movie that freaks me out the most about AI technology is like the military thing is pissed off at the US government for going against its advice.
Michael: One thing off the top of my head that would be kind of cool with it is for academic writing. In the social sciences, it's a little better, but in the physical sciences, these people don't write for humans in a lot of cases. And so having something that you can give it a paper and say, summarize this so that. Someone with an [00:28:00] 18-year-old education or reading level could understand it.
Alayna: I have actually seen people do that or heard of people doing that. Like taking, you know, their assigned reading and just telling the bot to summarize the main points, which is basically just, you know, how we used Spark Notes in our generation.
Michael: Well, the other benefit is that like, it would help you test how well you've written. So like there's fairy's, A lot of stuff is named after Fairy. But one of the things he's famous for is a logical test for a scientific concept. If you can explain the scientific concept without leveraging any other terms from within that discipline, you understand it and can express it well.
You know, have a physicist explain energy to you without using words like force, velocity, or mass. And that's a good physicist. And most of them [00:29:00] can't.
Alayna: I did hear that they just launched a new version of the Chat GPT- Chat GPT 4. So the first version or the previous iteration, they tested it on the LSAT and it scored in the 10th percentile, which was not very good.
But then the new version, Chat GPT four was able to score in the 90th percentile. And, it's just been giving it all these standardized tests and seeing that on the GREs also scoring super high.
Michael: I mean, just like humans, it was being trained to do well on the test.
Doesn't mean that it's any smarter, like, similar to humans. That's the other problem with this training. Machine learning has the same kind of problems as teaching a person is, if you hyper-optimize for anything, it's gonna optimize for that thing.
If you don't give it a breath, like [00:30:00] similar to the biased stuff I talked about before, we don't give it a breath of knowledge. It's not gonna be like a whole virtual person.
Alayna: Yeah. All right. Cool. Well, thank you. Thanks for meeting with me and, thank you, Sonya, for connecting us. I will pass this along. Sonya will, I guess talk to you later.
Michael: Yes, yes. Bye. Great meeting you, Alena. Great seeing you again, Sonya. Talk to you later. Bye. Bye.