This week I speak with Chris Longstaff of Mindtech about their curious plan to help machine learning by augmenting datasets with synthetic data.

I also cover alternative calendars, the woes of WeWork, robot writers and more.


64715309-44100-2-17e04420063e4 Chris Ward: [00:00:00] Welcome to the weekly squeak. Your weekly eat his Greek with me. Christian chiller. I don’t know about you. I’m losing track of time right now. I don’t know if it’s been a week since Alapatt last episode and what week it is. I don’t even barely know what day it is. Most of the time. At the moment is the holidays did not help or did help. [00:00:22] I don’t know. Anyway, I’m sure you all feel much the same. Time is getting squishy and wishy washy and hard to comprehend. Anyway, let’s carry on with the weekly squeak. I have an interview with Chris Longstaff of MindTech in his episode, and I’ve got a handful of links for you and a few bits of pieces of news. [00:00:42] So let’s jump straight in to the links. [00:00:47] Chris Longstaff: [00:00:47] This [00:00:47] Chris Ward: [00:00:47] is not a new article. I don’t always mention your articles. I mentioned, um, things that have sort of crossed my plate, come over, come across that have interested me as I’ve been researching other things. So this is an article from the guardian from last year. [00:01:00] From Steven pool, um, about robot alters. Um, I’ve been looking into this cause it had a strange idea. I with my wife, uh, I’m not sure probably year ago now when she mentioned some of the books you read. And to me they sounded very formulaic. And I said to her. Kind of jokingly, I think a computer could write those. [00:01:21] It’s basically like plug in these various variables and then just write the same thing each time. And of course this was also in 1994. And, um, I think probably some other dystopian books as well. And it actually goes by thinking, could we do, could we do this? Could we write like a sort of mass market romance book with an AI? [00:01:45] It’d be doing some digging into this subject. And actually, I came across a few things in this week that were sort of going into this, um, and some of the playgrounds where you could actually experiment with getting an AI to finish sentences for you and things like that, which isn’t quite what we were doing, but it was, um. [00:02:02] It was sort of a, an interesting experiment, see how possible it was. Uh, and this article refers a lot to the GP to library put out by, by open AI last year, and then who drew it again? And strangely, one of the playgrounds I found was using it. I totally understand how they’re using it if it isn’t available or there’s a subset available or it’s slightly confusing to understand exactly what it is, because sometimes you sort of get into reading a. [00:02:28] Somewhere between very high overview, think pieces and then really in depth pieces about natural language processing and sometimes not a lot in between. So it’s hard to know exactly what a. What you’re reading sometimes, but this article is a, is a very much a high level overview of what is possible. [00:02:45] There are already some robot authors used in some journalism, especially highly statistical journalism, like financial reporting or sports reporting. Um, and definitely know writers are already using, um, assistants like Grammarly and things like that, which do not right for you, but they prompt you to change your writing in certain ways. [00:03:03] And I’ve always wondered about how. They are affecting your writing actually did a presentation on this subject, uh, last year, the year before last, a soap confe. And there was a video of it, so you could pick that out. I’ll try and find it. Add it to the notes for the sharp. Yeah. It was an interesting article that had a few good links out to some of the things. [00:03:24] I then dug into it a bit more detail. So if you want to find some of those yourself and have an explore yourself, then uh, have a look, see what you come up with. Send me the links of what do you end up or what your AI ends up writing for you. Now an article from a wired by manual, middle, low. Um, this is, I seem to have a tenant had a, I seem to have what I should call SoftBank corner on the podcast, but this is specifically to doors we work, which, um. [00:03:52] Well, maybe it wasn’t me neither. We work on up. Um, don’t blame coronavirus for we both collapse blame, but we work, um, we work is collapsing. Maybe, I don’t know. I haven’t really heard any confirmation of this, but this article implies they are, um, obviously coworking is not a business model that is working very well and no one can use it. [00:04:12] Um, and lots of people are canceling memberships of coworking. I know. From some of the, the staff coworking spaces here in Berlin that, uh, struggling, it’s very difficult to replicate what they do and people are canceling their memberships, uh, corporate offices, I can say their memberships, which is where a lot of these places actually make their money, et cetera, et cetera. [00:04:31] It’s quite a hard time for coworking, especially if you are a coworking space propped up by a lot of money from an outside investor that has never been profitable. And has a lot of very expensive real estate. So I think we look at trying to blame coronavirus but the problems were already there. Of course, we knew about their failed IPO and things like that. [00:04:53] They’re using it as an additional excuse, but, um, it was not at the beginning of their problems. And this kind of is exacerbated and summarized in a M news from April the first, so two weeks ago that SoftBank decided not. Two by 3 billion of their stock. Uh, dealing a blow to shareholders and also dealing a blow, I guess, to confidence in the company. [00:05:19] And I have seen articles coming up the past couple of days showing that SoftBank are equally finally hitting financial troubles. And I was wondering how long that would take. If you are regular listeners to to SoftBank corner, because they’re an investor who is constantly, well, not constantly, but regularly invested in companies with large amounts of money that are not profitable. [00:05:41] So there will come a point where that will become a problem and yes. It looks like that problem is coming. Um, what will happen to Uber? I have seen lots of Uber cars driving around, but I don’t have any people to using them. So they may start getting hit by very particular similar problems as well. Now, this is a, an old article. [00:06:00] Nothing new at all from, um, I doubt a blog that we’ve ever featured on the weekly squeak list, I think it was just a good summary of what I wanted to discuss. This is something by Elizabeth Anderson. Called 10 bizarre calendars from history. This mostly came up and why I started looking around the subject because after a conversation with a friend, he mentioned the international fixed calendar, a calendar that has 13 months first proposed by a worker from the British railway, but actually popularized by a Kodak Eastman. [00:06:33] They actually for have to go and find the number here. For nearly 70 years. Kodak actually operated on this candor right up to 1989 and probably did a lot to popularize it, but it one of its main problems. So actually I should explain what it is. It has 13 months, each have 28 days. So it’s very easy to kind of know how many days in a month and et cetera, et cetera. [00:06:59] But the problem was, of course, lots of public holidays would have to shift. And this is what most upset people and as with any alternate calendar, and of course there are nine more in this article. There are problems with, uh, communicating with the outside world. And there are examples of this in various ways. [00:07:18] Like the Japanese have their own form of calendar. Um, the Jewish world has its own form of calendar. The Arabic world has its own form of Canada. There are actually many alternate calendars, impartial use, but generally people also recognize in other calendar the calendar that most of the world uses. [00:07:36] Which is the Gregorian Canada. And one the interesting thing to note with especially the Gregorian calendar, but other calendars is as is often the case with things from history. It’s maybe not as old as we think. And it was not always used. I suppose as global trade became more and more important, it was more important to, um, to solidify around one thing. [00:07:59] But there are still definitely autonic calendars used by lots and lots of people. And depending on where would go in the world, who would possibly be very confused by you referring to the date as the, uh, whatever the date is today or something, April, 2020 they might think of it as something else. And this is quite an interesting article to, to look at some of those alternate views, some from history, some not so old, and what are the various. [00:08:24] Positives and negatives about those calendars. Where, what’s your favorite calendar as a question for you? And now an article from the MIT technology review by a will at Douglas heaven? Uh, I’m sorry. I was trying to avoid the subject, but I’m mostly skirting around the edges of the subject. And why are the coronavirus lockdown is making the internet stronger than ever? [00:08:47] I think a lot of people thought the internet would really struggle with so many people working from home and streaming and granted to begin with. It did. Um, but it has mostly stabilized, I think through barriers and measures. Um, it depends where you are in the world, of course. Um, but actually. This connects nicely with something I’ve witnessed happening around Berlin as well, in that it’s spurred, it’s kickstarted a needed upgrade to certain bits of infrastructure. [00:09:12] Two companies are rolling upgrade programs that were probably scheduled, but are now kind of needed more than ever, and helping keep things going. And also doing compromises, like for example, um, with lots of the streaming services saying to the European union that they would reduce. The a stream quality from four K because you could argue, do you need full? [00:09:35] Okay. Anyway, so it helps everybody. And this has been interesting to see. As I say, this is something I have noticed happening in Berlin on the physical, well, the internet is obviously physical, but I’m in something you can see. I’ve noticed lots of building projects really rapidly getting completed. The workers are still working there because. [00:09:55] There’s less people in the way. Um, I’ve noticed various infrastructure upgrades to the public transport system here, building sites, getting finished much quicker cause there’s less traffic, less public transport, meaningless people around. So strangely, whilst we’re all not using it, the world is getting upgraded around us again, depending where you are, I suppose. [00:10:16] Um, and the same thing is happening to the internet. And this is often been through a statistical analysis. So some of the various companies involved in keeping infrastructure going or providing infrastructure are noticing when spikes are in certain parts of the world and shifting resources around. To, to help, and this is something they’ve always done, but they’re doing it on a much larger scale and on a different scale. [00:10:38] For example, the traditional centers of some cities where business was happening are obviously not using much capacity right now. In fact, they’re probably using very little. Whereas in the suburbs of some cities, people are using more now. So they’ve shifted around a lot of the capacity to to match, and this actually relates a lot to the initial structure. [00:10:58] And design of the, uh, the, the protocols that run the internet. It was supposed to be flexible and it’s showing that it actually works quite well. It’s fairly old technology, so it’s good to hear there’s more life yet and we will all be allowed to keep streaming our meetings for who knows how long anyway. [00:11:21] All right. Now here’s my interview with Chris Longstaff. Remind tech. This is an interesting, uh, company, uh, as you will hear. But, um, Chris was basically trying to show me a presentation as he talked, and I don’t know how much. This comes well, it’s going to definitely come across in the interview. I tried to ADT to a bit, so it doesn’t sound quite so much like that, but he does occasionally refer to things. [00:11:43] Um, that also you can’t see. And I will find out if I can provide the presentation in the notes and so you can refer to some of what he was referring to. But I hope there’s enough left in the audio that you understand what the company is trying to do because it was quite fascinating. [00:12:01] Chris Longstaff: [00:12:01] So I’m Chris Longstaff. [00:12:03] Uh, I’m the VP of product management for, uh, MindTech global. And, uh, so I’ll take you through today a little bit about our product chameleon, um, and, and talk through that. Basically, mine techs are UK based startup, uh, started in, founded in 2017. They really started working in 2018. Um, and we really all about, um, producing solutions for AI, uh, particularly in the tool chain that we’ll talk about today for synthetic data generation. [00:12:36] Um, and focusing obviously business to business markets, um, rather than, than directly to the consumer, a very experienced team. You’ll see there in terms of both the leadership and the board of directors, obviously in terms of target markets. Because of what we’re doing. It’s fairly broad. We’re doing, you know, we’re really talking about synthetic data, uh, generating the data that I’ll talk a lot about. [00:13:01] So really going across a large number of markets, though, focused a lot at the start and smart vision. So particularly in terms of retail, uh, and in this of the safety and security. Uh, parts of those markets. Obviously, you know, for any AI system, there’s a few key elements to that. There is the, the architecture itself, and obviously these have evolved rapidly over the, uh, the last few years, and we have different architectures, you know, going from. [00:13:33] Alex net, Google net through to, you know, more, more than was so resonate, uh, VGG all these kinds of different, uh, networks. Um, most of these will be obviously developed under some sort of framework, tends to fly PI towards your, so on. But in order to make that do anything useful, obviously we’ve got to train it with a huge amount of data. [00:13:54] And as the famous quote from Andrew M and J says, Hey, you know, it’s, it’s all about the data. I’m really, what we’re trying to do is. You know, get these machines to, to understand and to do that they need labeled data. I mean, it’s an interesting question that we, you know, we get asked by our customers as well, how many images do I need to train a network for certain things? [00:14:17] And obviously it’s not all about the quantities, it’s about. You know, are they the correct images? You’ve gone it. There’s a few examples here. Uh, you know, in the automotive world, if you’ve recorded lots and lots of straight roads, then that’s great. But when you come to something like what we have called the magic roundabout in the UK, which is one big roundabout with five smaller around the thoughts around that. [00:14:42] You know, anyone who’s not, you know, a fairly experienced a driver and understands roundabouts will very quickly become confused in that. Yeah, we have this, uh, concept as well of data drift. Uh, and what we mean by that is that, you know, the requirements for data change over time, sometimes more rapidly, sometimes more slowly. [00:15:04] But you know, the example there, I’ve risked warp chairs. Obviously they’ve changed over time. Um, so if you’re trying to recognize a wristwatch, then trading something on pocket watches isn’t going to give you the right results when you’re trying to look for an Apple smartwatch. And again, understanding a scene can be very, very tricky. [00:15:23] You know, that first image there, you know, is that people fighting? Are they dancing? You know, what’s going on there? You know, will your camera recognize that in the middle as a pair of cows, or does it recognize it as the advertising poster on a bus? Hmm. [00:15:46] The problem here is obviously that we need to have data and we need to have, as I said, a lot of high quality data and we need, um, access to that. And there’s very, very few companies who have access to those very large amounts of data. Uh, and even those that do, it’s perhaps questionable as to that rights. [00:16:08] Uh, always do use it. And certainly those that do. Uh, mostly don’t want to, uh, to share that data. Several key issues that we’ll sort of cover about, you know, how synthetic data can, can help overcome some of these issues in terms of real data. But real data is very expensive to get hold of. So if I go out and record things, it’s very expensive. [00:16:33] Takes a long time. Rarely shoes around privacy. Um, and particularly, you know, with respect, obviously facial data is very common, but even things like seeing vehicles and having vehicle license plates stored can be a key issue for anyone whose system is hacked and has a, you know, this kind of data stored. [00:16:56] Getting that data not to have bias in it is a very, very difficult problem. And, uh, that is where, you know, having a very, very broad spread of data is critical. Getting the annotation accuracy very, very hard for real data and very, very expensive. And the more accurately you want and, uh, and it takes something that the more it, uh, is going to cost you. [00:17:23] Yeah. What’s our solution is synthetic data generation. And you know, the way that we do this, we create a virtual world. Um, whether that’s a hospital, like the image in the top right there and underground station, we add the assets that we’re trying to look at in there. So that’s the cars, the people. And so, um. [00:17:44] Configure times of day, global variables like weather. Um, and so, um, we then effectively film this as we create a virtual film. And obviously because we’ve created that virtual film, we have a full understanding of what the objects are. We placed them in the scene so we can create the output masks. And so on. [00:18:05] And then we have a full tool chain to, uh, to manage that, uh, uh, to manage the data that we’ve generated. Huge number advantages to a using synthetic images and data. Yeah. Um, I’ll talk through a few of them in a bit more detail. I’ve mentioned some already, uh, fast creation of a very large data sets. Uh. [00:18:31] You know, the, the adaptation again there to drift data. Obviously very topical at the moment. Coronavirus how do we adapt things which may want to check for things like people wearing face masks, getting that corner data. So a drone over an airport can be very, very difficult to film in real life. But we can create that kind of corner data using synthetic images. [00:18:55] We can rapidly change any environmental conditions. So we can view the same data in different conditions. And we can do that effectively, instantly. We have many different kinds of, um, sentences that we might want to model. So not just necessarily RGB cameras. We may want to model LIDAR. We may want to model radar, uh, as well as the, uh, traditional RGB sensors. [00:19:22] Uh, multiple cameras at the same time. Clearly you can do this in real life. It’s actually very, very difficult to, uh, synchronize the cameras, uh, and, uh, make sure that you’re looking at the exact same data at the exact same time. And tuning those images. So you see example there of a, a fisheye type lands and making sure that we can, uh, match that to the system that’s actually going to be used for the inferencing is very, very important. [00:19:53] So slide 14, uh, quite a detailed slide shows effectively, you know, what’s our platform and solution, um, comes in a number of, uh, different elements. The, the key in the middle as the, the tool chain that we have in the middle. So we have the, the scenario editor, an asset importer, which we’ll, we’ll talk more about the simulator and the data set tools. [00:20:18] The talk, we have a number of what I would call sort of creative elements, which are what are going to configure the scenario for us. So create, if you like the film set, uh, put the actors into that film, set a model, exactly what cameras were going to do. And then we generate that data and those data. So that tools then are going to take our data, they’re going to take customer data, merging real images, and then go through that standard. [00:20:46] A neural network framework to create the train network all the time. They, what’s important is, you know, at the bottom left hand side that. Customer use case. So we have to keep bearing in mind, well, what’s the end objective for this? You know, what’s the customer trying to do? Are they trying to create a pedestrian detector? [00:21:05] Do they want to try and, uh, identify, uh, animals because they’re causing false positives on doorbells, you know, all these kinds of things. So that customer use case is critical that that is the. If you don’t have the coolest stone of everything that we’re doing. So just a little bit about, uh, you know, how we go about creating these things. [00:21:25] So we’ve created a scenario and it’s, um, the reason for this, obviously, we don’t expect that the data scientists who were working with our three D experts, uh, you know, the, the data scientists are there to work out. The algorithms work out. You know what kind of training data they need, but we’re not expecting them to be the 3d experts. [00:21:47] So we create the application packs of, that’s including the scenes, the actors, uh, various different objects we might want to place into that scene and then have a full UI for scenario editor. Well, we can effectively drag and drop the different elements with a real time preview. So the data scientists can say, okay, I see what I want to see here. [00:22:08] I want to check that the people in this office are perhaps wearing face masks because that’s the new policy. Um, and so they will see real time how this is going to look on the, um, the film, and then they can use that to effectively create the whole. Um, film set with actors and visualize that before going ahead and running the simulation, we then send that scenario definition. [00:22:36] So including all the details of the scenes, the assets, the locations, all the behaviors to the simulator. That’s going to create a number of outputs for us. So whether that is, um, you know, they, it’ll create the visible image. Uh, we may want that visible image to be in Bay space rather than RGB. Uh, we might want additional data like light arm range data. [00:23:00] You want some, uh, mask data. So we want obviously, individual objects to be identified. One of the key things about synthetic data, as I’ve mentioned, is that, you know, we can uniquely identify every single object. So, you know, in that instance there, you can see the pedestrians are identified. Actually every single pedestrian there has an individual, um, ID. [00:23:24] So if we want to do re identification, for example, we can do that quite easily because the same pedestrian will be always identified with the same ID. If you’re using real data, it’s very, very hard for someone to re identify, uh, the, um, a single pedestrian is day in, particularly if it’s sometime later, you know, if it’s someone’s gone off screen for 10 minutes in a retail environment, um, then coming back 10 minutes later and working for that person to annotate that, that’s the same person that’s exceedingly difficult. [00:24:01] There’s, of course, other advanced annotations that we get, things like velocity vectors, again, from generating those from two D data is virtually impossible. Um, as is, um, things like 3d bounding box is very, very difficult to do. Um, in a traditional take a, uh, a two D film of something and then annotate it by hand. [00:24:26] Course, it’s very, very important that we have, you know, great results from this. Um, we have a couple of examples here of, of it working. Um, the first one is actually a third party paper that we wanted to, to share, just to show that it’s not just, um, MindTech who believe in this technology. And what we see here is that, um, there’s a training done either with a. [00:24:51] Synthetic only real only or a combination of the real and synthetic. So what we see here is that the, the number one there is basically synthetic only we’d see. We get very, very good results from that synthetic data. And in fact, the, uh, you know, a lot of customers are coming to us because they have an idea. [00:25:12] They think that they might want to do something, but investing in recording. Uh, real world data is very, very expensive. Yet being able to prototype using synthetic data first, uh, makes a lot of sense. And you know, you’re not going to get the, the ultimate accuracy from that, but at least gives you an idea of whether your algorithm can work or not. [00:25:36] Over on the right hand side there. You know, we see that with a big real world data set, we get the best results when we combine that with the synthetic data. So that bar under three there, that’s showing that, you know, we have the best, um, possible accuracy using a full synthetic dataset and full real dataset. [00:25:59] What we also see is that, you know, in the middle there, under two is that actually. We could compile a small, a smaller real world data set. Well, it meant that with the synthetic data, and we’ll end up getting a result, which is actually superior to spending all that time and money generating a huge real world data set. [00:26:22] So, you know, synthetic data may will get as much faster to market because after a short amount of time, or a shorter amount of time collecting real world data, augmenting that with synthetic, we’re going to end up with some very good results. And, and this one’s for one from our automotive park here. Um, and what we’ve done here is use the, the kitty data set, and we’ve walked, mentored that again with our own, uh, data here. [00:26:50] So from our simulator. Uh, with a city, same with cars and pedestrians. And we see that the accuracy in, uh, the improvement in detection accuracy for calls goes up 6.9%. And for pedestrians up 8.4%. So quite a significant result in terms of the improved accuracy. [00:27:16] There’s another couple of critical things that, um, we can do with ‘em. Synthetic data. I’ve mentioned this one before. So basically optimizing for the deployment system. So what we do here is obviously we generate what we call gold data. So that is a I very clean image. Now. At the end of the day, what is going to be used by the, uh, deployment system might look very different. [00:27:45] Obviously in the example there, it’s a, you know, as a survey, official islands that’s being used. Uh, it could be a very noisy sensor and so on. We have, um, the goal date here, which we can net or meant and modify by the customer specific system, which means that, uh. When the system is actually deployed, the training data that’s being used represents that deployment system that’s also going to optimize it. [00:28:12] So if you change sensor manufacturers, lens manufacturers halfway through a run, you don’t have to throw away your algorithm. You just have to augment and train with that new, um, uh, data based on the new camera system. So, yeah, just an example here showing, okay, three different cameras. A fish are, you know, uh, black and white, uh, very low cost, noisy camera or, um, and, and each of these can be modeled using that very, very same golden data. [00:28:47] Uh, but it means that it’s, the AI system is going to be optimized for your deployment system. [00:29:00] I’ve mentioned this before, but obviously we have a very flexible approach to cameras. We can have multiple cameras together. Um, these cameras can be visual cameras, like a, you know, RGB cameras, but also things like a LIDAR camera, a radar camera, a thermal camera. So. Effectively, um, from the same, uh, position, we can create that multiple cameras and we can automatically then synchronize all of those socio trying to train something to understand what the world looks like for a multiple different cameras. [00:29:38] Then you have absolutely synchronized, uh, datasets. [00:29:45] One of the other key things for which is becoming very important. Uh, you know, we see that within the world of, uh, AI, um, and particularly with ethics, um, for things like automotive, it’s very, very important to be able to show where your data has come from or how you created that with our. A chameleon system. [00:30:08] Well, we have a effectively a tagging system, which gives Providence tags, uh, which enables, uh, anyone to be able to go back and show, well, this is the exact data that was used to create that training data, which was used to train that AI system. Uh, so we, we can. Basically tag all of the data and even if it’s augment it and so on, uh, we, we will tag that data so that as we have the, uh, the data pipe library with all of the images, masks, annotations, that’s effectively all tagged with the Providence and the store. [00:30:47] So that later on you can go back and show exactly where that training data came from. And. As I say, for something like automotive, uh, that’s critical, uh, to be able to have that provenance. I’ve mentioned already data drift, obviously things, you know, there’s, there’s new use cases comes along where we need to react very fast and going out and filming. [00:31:09] Um, on generating that new data is very, very difficult and very, very time consuming and expensive. So something like Corona virus, IA, you know, we can create a synthetic data masks, we can then create the virtual objects so that for new use cases, um, the, I perhaps it’s, it’s monitoring how many people in a queue are wearing a mask. [00:31:31] Perhaps it’s making sure that tables in a restaurant are regularly cleared. Perhaps it is trying to look at re identification of people to understand, okay, who’s a repeat buying? And, um, you know, where there’s restrictions in terms of purchases. So something like synthetic data can help us react much, much more quickly than we would be able to react if we were relying on a traditional, uh, real data. [00:32:07] So some more benefits, which, you know, I’ve already mentioned a little bit, but certainly things like bias reduction, um, and you know, bias can be anything. So any, anything which causes a dataset not to have a balance in it. Um, so, you know, it could be that perhaps all the faces are taken from the wrong angle, obviously with synthetic data. [00:32:29] We can repeatedly, um, rotate the face, generate, uh, images from lots of different angles to help train that, uh, data set. I’ve mentioned perviously. Yeah. It’s clearly a big issue for, for training data. And again, alongside the Providence that I’ve just talked about. Um, this, uh, data privacy here. Uh, you know, using synthetic data allows us to generate scenes without the risk or, uh, people, um, or, uh, businesses becoming, um. [00:33:08] Uh, concerned about the privacy issues. Yeah. And mentioned this case already. So the drone in an airport, if I really want to understand that, you know, is it a bird? Is it a drone or not? And I’m trying to create that dataset, then there’s no way that this could even be filmed in the real world. No one’s going to allow me to film drones near rapports. [00:33:29] It will be very bright, dangerous. So we. Uh, can create those synthetic data cases, um, for that, those difficult corner cases. Again, I think I’ve mentioned most of these, the, uh, advanced annotations that we can get so. Basically, it allows new use cases. So, uh, if you have something, you know, a time of flight sensor, which perhaps can measure, uh, crepe point clouds, measure velocities, and so on, we can use that in conjunction with, uh, visual cameras and the synthetic data can provide us training data for that. [00:34:09] To teach AI systems, uh, you know, new for new algorithms and new use cases that with manual annotation, we just couldn’t achieve. Um, and just lastly, obviously, you know, the whole of our, um. Simulator is designed to be very, very scalable. So, um, we’ve done some work, uh, well, multiple companies, but AMD in particular, and ensuring that we can scale across, uh, multiple different CPU, GPU use, uh, to make sure that our. [00:34:44] A simulator is highly scalable. So where you want to generate massive quantities or data, uh, we can generate those, that data, uh, very, very efficiently by using all available compute resources. So the synthetic data, [00:35:05] Chris Ward: [00:35:05] like what’s it come from? Is it, um. From your libraries or do people create it themselves? [00:35:12] Chris Longstaff: [00:35:12] So if I go right back to to, uh, probably this to my gram, Hey, uh, Oh, maybe actually this one hand. Uh, what we do. So we, we would get some, some scenes. So the, um, the buildings you see there in the application park, um, maybe if I can enlarge that a little bit, it probably doesn’t look great, but I don’t know if that shows in life few, but yeah. [00:35:37] We, we create that scene so that, that scene, that would basically be a three D object and we would get a graphic studio to create that for us. So we’re not artists, so we get a graphic studio to create that for us. Similarly with the, uh, the people. The, uh, the object, all of those kinds of things. We would get a studio to create that for us. [00:36:00] And then what we do. So, um, like, like the scene that you see here, we would, with the, using that scenario. So that’s where we place the people in the scene. We place the items in the scene and the Cameron, the scene there, so that we’re then determining, okay, how are we going to effectively generate. Images from this. [00:36:22] So it’s in factory, like, you know, think of it as a virtual film set. Oh, we, we licensing the graphics, but they don’t have any behaviors. So, you know, for example, you w we might license a person, but they won’t be knowing how to walk, how to react with the scene. They won’t understand what particular behaviors they should exhibit. [00:36:45] So that sort of see all of the things that we are doing. And how [00:36:50] Chris Ward: [00:36:50] do you, what, what, what kind of, um, minimum quality do you think models should have to make them useful to people in their training data? [00:37:03] Chris Longstaff: [00:37:03] That’s a very interesting question. Um, the, it’s gonna determine, depend very, very much on use case. Um. [00:37:13] Oh, we’ve found, for example, that to do a pedestrian detector, you know, the, the quality can be fairly moderate because it’s more about the position, the angle, and making sure that, you know, they are, uh, anatomically correct. And it’s, it’s more about, you know, how the teaching the machine to see against different backgrounds, different lighting conditions. [00:37:37] And so on. If you’re going to go and try to do some facial recognition training, then obviously you need to make sure that you’ve got something which is a, you know, fairly anatomically correct. Uh, and, and you need some, some fair accuracy in terms of that. So it really does depend on the use case, but actually it’s sometimes. [00:38:02] It’s difficult for us because people you know, would see something and they say, well, that doesn’t look like that, you know? I don’t know. It looks very realistic and then we have a problem to sell it. But actually from a computer point of view and from a training point of view, it makes little difference. [00:38:18] So it does depend, but actually, surprisingly, it’s not necessarily as important as people might think. [00:38:26] Chris Ward: [00:38:26] And on the subject of diversity, you mentioned there a couple of times, which is certainly something that is quite important and quite interesting here. How do you. Well, how, how do you guarantee slash encourage, I suppose, is the better word, that there is a diversity of models? [00:38:45] Is it just making sure they’re there or like, and also as part of that journey, did you find that the models Brad lacking in the first place in terms of diversity? [00:38:55] Chris Longstaff: [00:38:55] So, so today it’s still unfortunately quite a manual process. Um. It’s about identify, you know, using your test case, there’s identifying way you’re failing. [00:39:08] Um, I, I’m then trying to generate more data to, to overcome that. Uh, it’s definitely something that we are actively researching and we’ll, you know, hope that we will, you know, start to have better solutions for, than a fully manual one. So we’ll see. You know, at the moment, it’s. Down to the scale of the data scientists to make sure that the test cases that they are using are representative. [00:39:33] Review the test cases and see the failings and then say, okay, you know, typically people think of it as, you know, uh, not having enough, you know, perhaps people of different ethnic origins in there that things, but it’s, it’s everything like, uh, cars. So. Uh, you know, do you have too many red cars? Which is actually, uh, throwing what the, uh, you know, when the, the machine understanding is of a car, do you not have enough pickup trucks? [00:40:03] And that’s throwing us you, yeah. All these kinds of things, uh, can be identified but a site today, it’s a manual process, but that’s definitely something that we are working on. [00:40:15] Chris Ward: [00:40:15] It just, I mean. This might not be a question you want to answer. I’m not sure. Oh, and also it might just show my ignorance of kind of how this could look in a, in a real world production kind of training environment that my experience of having looked at. [00:40:32] Uh, modeled scenes in the past. Usually these sorts of things you see outside, uh, building plans and things like that and everything always kind of looks a bit too perfect and everyone all looks kind of too perfect. And I just wonder, do you, do you find those sorts of consequences with these models that everyone just looks a little bit too unlike real people? [00:40:57] Chris Longstaff: [00:40:57] You definitely, you know, it was definitely, again, one of the things that is. You need to be cautious of is that, you know, everything is, you know, clean and perfect. And as I say, one of the things, for example, um, you know, is, is making sure that we model the cameras to model the camera noise. Because obviously when you have a camera in a low light, you’re not picking up this clean, perfect image. [00:41:20] You’re picking up something which is noisy and has . Promo spackling on it and so on. So you know that that’s important, but you’re right in terms of the real world. Yet, if you’ve got 20 cars going past a and you look out, you know, look out of my window now, out of those 20 cars, then you know. Probably 19 of them are a little bit dirty. [00:41:39] Probably two of them have a big dent in them. Probably three of them. Uh, have a stickers on the bumper. So you’re right. Yeah. There is a lot of that. Now there are certain things that we are evolving the simulator for to try to add. More variations in that today we do it, but it is a more of a manual process than we’d like. [00:42:03] Again, [00:42:04] Chris Ward: [00:42:04] programmatic chaos. [00:42:07] Chris Longstaff: [00:42:07] It’s, you know, it’s a very good point in the humane and it is definitely something that, you know, today we handle in a way, uh, you know, we, we do handle it, uh, but it is not nearly as automated as we would like, and that’s something that is being developed right now. Uh, in terms of the, uh, fixing that. [00:42:27] Chris Ward: [00:42:27] And just to clarify, I know you had a slide on this, but just to kind of go into a bit more detail, what’s the kind of the, where, where does chameleon sit in the data scientist tool chain? And I guess what, um, I mean, you listed a few on the slide. Like, is it basically just replacing that kind of manual finding of, of datasets? [00:42:49] Chris Longstaff: [00:42:49] Pretty much. So, so I, I mean, we see it definitely it’s, it’s a tool, if you like, in there, the data scientist tool box. Because, yeah, if you look at this diagram of the convenient platform, you know, today what the data scientist does is, you know, they will get that customer’s real images box down at the bottom there. [00:43:09] And that’s, you know. Uh, an image at a, a labeled image there that you see with the, the outline bounding boxes. And they basically, they’ll take that to train that their network. So what we’re saying is, okay, you know, that’s great, but you’re not going to have all the right data from that. So add data, set tools there. [00:43:30] In fact, we’ll basically, um, merge that re those real images with our synthetic images. Automatically generate your test, your validation, your training datasets, and then you send that through to the neural network. They also, then, the dataset tools will give you a feedback in terms of. Uh, okay, this is the, the statistics. [00:43:56] So you’ve done your training and now here’s the statistics and enable you then to, again, today it’s a manual process. The data scientists couldn’t let go. Okay, that’s, that’s not quite right. I need some, some more data here, so I’ll add it. But, you know, taking the. Again, you know, the example of coronavirus today, a data scientist is probably got a dataset that he’s trained faces on and say, it’s okay. [00:44:21] My face recognition algorithm works really well and it works great across all different ethnicities, and I count the number of people, but now, Hey, it’s not really working, or I want to actually differentiate those people with them without a face mask. We can very quickly generate our data. So that’s where that orientation comes in there. [00:44:44] So [00:44:44] Chris Ward: [00:44:44] that’s a very good use case you gave, but, uh, in as much as you’re allowed to, are you able to mention any, um, live real world cases of people actually using the tool? [00:44:59] Chris Longstaff: [00:44:59] So, um, I mean, we. We have got our first customers to the tool. I can’t talk about the specific use case, but they are. Um, so, so one customer is in, uh, the retail environment. [00:45:12] Um, and one of the use cases that they are looking at, um. Is a shelf occupancy. I’m understanding they do hang things. Understanding, you know how shells are occupied, how consumers take items from shelves. So wanting to get, build a picture. Of, uh, the, the way that that works. Um, another use case that someone has is, uh, they’re trying to understand, um, how long people go into a store for, and whether the correlation between people who have purchased something and those that haven’t. [00:45:50] And I mean, that’s an incredibly complex. Problem. It’s trying to understand, you know, the, uh, amount of time someone goes in it’s person re identification, uh, how you exactly identify and in an anonymous fashion, um, what they’ve done. Uh, but you know, there’s that kind of thing where, you know, again, it’s a combination of real and synthetic, but the. [00:46:16] Use of synthetic data because you can re identify people, uh, helps them with that, uh, that training. [00:46:23] Chris Ward: [00:46:23] And obviously this is a relatively new product anyway, but, um, what’s on the roadmap for the next six months or so? [00:46:33] Chris Longstaff: [00:46:33] So there’s a number of different things. You know, I’ve hinted, obviously we, yeah, we don’t want to say too much publicly until that they’re ready. [00:46:39] You know, I, I’ve hinted at, you know, there’s certainly more automations coming, uh, in terms of, you know, some of the behaviors that we’ve, you know, we’ve discussed both by us and, uh, you know, the, the imperfections in the world. So there’s more automation coming from that point of view. Um, we are looking at, um, you know, the visual quality as well. [00:47:00] And actually one of, um, so our, um. VP of AI strategy. I’m just in Rhonda and came from Microsoft. You know, he’s got a lot of expertise with this and one of the things he’s actually tasked with is trying to determine first, well, you know what, what is required in terms of this, you know, the very good question you asked in terms of visual quality and you know, where, where does it matter and how does it matter? [00:47:25] But we are making. You know, improvements in the render pipeline in terms of that visual. I [00:47:31] Chris Ward: [00:47:31] have one final question that is probably somewhat, uh, important to my audience especially, which is, is it looks like quite a fascinating tool and something that people will possibly want to try. Is it possible for anyone just to try or is, is it a on appointment only? [00:47:51] You know, is there a self-serve option? How, how, how can if, if Ken, [00:47:57] Chris Longstaff: [00:47:57] so to, so today it is a, you know, contact us and we can arrange it down by, we can arrange something like that. Uh, we are considering whether we can, um, and it’s something that I think we will do because I’m pushing it, is to open up some data packs. [00:48:15] So to give some example images and annotations that we just allow anyone to die, we’ll also help you get [00:48:22] Chris Ward: [00:48:22] feedback on some of those as well. So, [00:48:25] Chris Longstaff: [00:48:25] yeah, exactly. So, so that’s, you know, that, that’s definitely within the plan, but obviously we just need to, you know, make sure that we. Have the right structure in place is obviously we are not a massive company yet, so we can’t support the a thousand users or trying to use that and asking us questions, but we don’t want to put something out there, which then just annoys people because they say, well, I can’t use it cause I don’t have the right formatting. [00:48:49] How do I do this gang ignored by mine tech? So we do want to do something like that. Uh, but obviously we’re just making sure we do it in the right manner. That [00:48:59] was [00:48:59] Chris Ward: [00:48:59] my interview with Chris Longstaff of mine tech. Hope you enjoyed that. Okay. What’s new? Well, I’m still working on the storytelling podcast. [00:49:08] The boardgame jerk podcast based on my board game with jerk Twitter bot, which you can find where you would expect on Twitter. We did some test recordings of that. We are working through that. Um, I am starting the solo game live stream on my Twitch channel. Don’t think it’s on my website. I should get this stuff on my website. [00:49:27] Um, at some point in the next few days as well. I’m also taking part in the critic test dummies live stream, which you can also find on Twitch. So have a look there. I will try to put all these links on my website as much as possible. I have a few articles in progress, they will just keep it on my website. [00:49:43] I think cause easiest thing to do. I need to update it. I also need to update the look and feel of it a bit. I’ve been struggling to figure out how to put in other sorts of content, like update content, um, other things I do and stuff like that. I need to have a blog for my blog. I think, uh, anyway, various things in progress there. [00:50:02] Um. Keep an eye on my various social protocols, which you can find on dot com please rate, review, share. If you enjoyed the show, I’d love to hear from you. Again. You can find and until next week. Thank you very much [00:50:20] Chris Longstaff: [00:50:20] for listening. .