Telmo Silva created ClicData, an end-to-end SAAS BI platform, which as he describes, is the little guy coming up in the BI platform world. He talks about how his company was started, where it’s been, and where it’s going with cutting-edge R&D. He also offers additional thoughts on the role of data in the business world today.

Telmo Silva: That’s the next thing that we’re trying to see, “well, if we don’t have enough volume to start detecting these patterns in a statistically meaningful way, can we help it with some human-generated models that potentially could help us advance faster?”

Ginette: I’m Ginette,

Curtis: and I’m Curtis,

Ginette: and you are listening to Data Crunch,

Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company.

Curtis: So, Telmo, thank you again for joining me. Everybody, we have Telmo Silva here from ClicData to talk to us about a host of interesting things. I just want to start off with getting an introduction from you and who you are, where you’ve been, and we can kick it off.

Telmo: Well, thanks, thanks, thank you for having me here, Curtis. I appreciate that. Yeah. So again, my name is Telmo Silva. I am the CEO and founder of ClicData. For those of you that have not come across ClicData, we, we claim to be, and we are hoping to be the number one cloud-based business intelligence end-to-end platform.

And the way we envisioned this platform, many years ago, as I was dealing with, with pharmaceutical marketing sales and medical systems was the fact that I was constantly being faced with the challenge of implementing large-scale business intelligence data warehouse systems only to go back to the users and to the affiliates and to the different countries to see them export things into your favorite spreadsheet program. And that was frustrating.

And as such, you know, I grabbed a bunch of data specialists from large retail companies and, experienced data scientists, which back then there was not even that terminology of data scientists developers.net and so forth. And I put them all in one room and I said, listen, I have this vision. And I’d like to start something called ClicData. And this is a vision.

And, you know, 10 years later here we are. We launched it about five years ago. A truly cloud platform, and our biggest differentiator really in our focus is to say as much as we can make data beautiful, and there’s some beautiful visualization tools out there and even components, the problem has always been the data, you know, how to get to the data, how to massage the data, how to cleanse the data and so forth. And it’s 70% of our time is just getting those . . . the database. Everybody can make pretty charts to different tastes and different, obviously, skill, but nonetheless, a lot of the problem is getting to the data.

So we wanted to kind of really bring a BI platform end-to-end to the masses. And that’s what ClicData is about. 

I’ve been doing that for a long time, and, you know, along the way, we start developing other interesting side things, which, you know, obviously in these last 10 years, things such as machine learning, artificial intelligence, data science, all came to flourish, and we’re, trying to navigate that sea of new technology and new areas that we need to potentially get into as time goes.

 Curtis: You’re mentioning data preparation, and you know, making sure the data is clean and everything, which as anyone in the data field knows is absolutely the worst part of your job. And it’s the biggest part of your job.

But I’m curious now you you’ve had a business in this for a long time. One of the hardest things that I’ve seen from a data science perspective is trying to describe the complexity and the difficulty of data preparation to people that are not data people. Right? So they’re like, why. Why can’t you do this faster? Like look because the data is a total mess, and it’s hard, right? So I’m curious if you have, like, how do you present that to people in a way that they get the problem? If they’re not in data. That may be a good perspective for a lot of people.

Telmo: Yeah. I think that’s a great topic and, and definitely a challenge of ours. Right? I mean, if you look at the history of, of how people have done reporting, and you would go, “Oh, you know, we have these local database, let’s just, you know, plug in, you know, some tool crystal reports or whatever, you just write against it. And we do all the querying and all that stuff.” And as, as companies are moving to the cloud and specifically different cloud systems, picking the best of breed for your ERP, for your CRM, for, for your transactional data, now, you know, we’ve introduced potentially we’ve solved a lot of problems by going to the cloud and potentially even facilitated the use of large scale computing systems. But now we have this problem that our data no longer resides in some such a thing that is easy to get to.

So how do you explain to a business owner that, that basically, it’s just interesting to know how many couches they’ve sold, that we have to hit an API that there’s a, a throttling of 100 rows per second. And by the way, that’s how we’re limited, right? That’s we cannot go faster than what your vendor is going to give us the data. Right?

How do we explain to somebody that has a marketing agency that because Facebook got hit with, you know, a bunch of problems related to Cambridge Analytics and everything else that now they’ve locked down their API is so tight that every change we. Every new data element that we want to make when we need to obtain approval from them as vendors to to get that data on their behalf. Right? So it becomes very quickly a technical discussion of which many of our customers potentially and many of the business users are not ready for.

At the same time, we also see a growth and, and, you know, the age of the folks that would print their emails because, you know, they didn’t quite understand the concept of email.

We had also the age of the Skype users that started using Skype in businesses that did not allow for Skype. And now we’re in the age of people that potentially already know what Python is. Right? So I think there’s also a progression and of understanding of our customers and the folks that are, are seeing that data is no longer just a side effect of having a transactional system in place.

It’s actually a commodity. It’s something that they need to rely on and why not even commercialize. So they themselves will improve over time, in, in teaching themselves these types of technologies, but, you know, we use a variety of techniques to try to explain those difficulties to say, “yes, I understand that you’re trying to match this data set with that dataset, but by the way, there’s no link.”

Right?

So. You’re going to have to make some very good assumptions as to how you’re going to link those data sets. And, and you know, this, this is my dream initially was to build a product-only company. And as it turns out, what I’m doing these days is actually hiring more data scientists and more customer data experts that interact with the customers that explain and work with the customers.

So this thing of being just a tool and just saying, I’m going to buy this tool and it’s going to solve my problem, that may cut it for some, but other people do need more training, and they need our expertise as well in explaining to them why those things don’t happen overnight sometimes.

Curtis: Data is so varied and there’s so many use cases and so many systems that it’s almost impossible to have one piece of software that solves it all. Although, I’m curious. So, so ClicData, then, it’s a sort of a, a combination of technology and consulting is that fair where you, you use your technology to make the consulting you do easier essentially? And more effective?

Telmo: Yeah, I would like to think that, uh, we’re a product first company. We are developing a true BI platform, but we found that we get best results or rather our customers get better results, for some types of customers, if we have staff that is able to translate, you know, data, data issues, and visualization issues and security issues, in normal day-to-day language and having, an entire team, which we call our professional services team dedicated for that either in training or onboarding.

So it’s rather the other way around. But, but yeah, you know, some days I question whether, you know, it’s the tool helping the team, or is the team helping the tools sometimes.

Curtis: Yeah, fair. It’s interplay there for sure. And I see that in a lot of, actually, products that are related to data science is oftentimes you have to have this, this arm, that that is human based ’cause it’s just, there’s too much variability a lot of times, but could we dive into some of the capabilities that you’ve, you’ve, I mean, you’ve done this for 10 years, so you’ve built stuff into this tool to solve I’m assuming a lot of, sort of, common problems, but also, use using AI and ML to predictive analytics, for example, to overcome some, some issues. So I’d love to dive into what you’ve seen and done. 

Telmo: Yeah, absolutely. I mean, at its core, ClicData, what it allows you to do is basically, we offer a sleuth of connectors to different APIs and different systems. So we have over 60 connectors or so, and then we have these generic connectors, like our web service connector, that allows you to connect, to graph QL and, and, a variety of other, hundreds of other APIs that exist out there.

So that’s the first part that we found is how do we get that data that’s sitting either behind a firewall in a, in some kind of a database, or it sits on, you know, a cloud-based system or cloud storage, or what have you. And how do we kind of bring that in and kind of create that, that, you know, for lack of a better word, a, an ad hoc data warehouse.

And that was the second step that we had to decide. Did we go with a, a default model of a data warehouse where we have to sit with a customer and say, okay, what are your metrics? What are your dimensions? What are your hierarchies? What are your filters, et cetera? Or are we going to start going into more, you know, a new age of thinking, which is now let’s dump everything into some kind of data lake, call it again, whatever you have, but let’s, let’s build as many indexes and get some understanding of that data ourselves and try to come up with algorithms that facilitate the querying of that. And that’s the second piece is, is either allowing the customer to start massaging the data themselves, cleansing using the typical ETL tools that we are used to, either SQL or using our designer to start putting things in cleansing the data, adding enriching the data, joining the data, fusioning in the data, et cetera. But then in addition to that have a parallel function, which kind of starts understanding what our customers are doing with their own data.

And then obviously the third piece, which is, “okay, great. You have all these data sources connected. They’re all in our databases. How do we visualize that?” So a dashboard editor, which is, you know, non-OS specific that runs on any browser.

And then the final piece is, you know, there’s no point in having an amazing dashboard if you’re the only one looking at it. Right? So the distribution, the security of that, passing it, embedding it into your other systems, your portals, your websites, et cetera. Right? So that entire thing is the BI platform.

But when we initially started, we had a freemium, we had over 30,000 accounts running at the same time. And we did that, not because, you know, . . . It was a period of trial to understand (a), can we scale intelligently? That means how can we hold all these accounts, data in different regions, respecting data protection, regulations and laws such as GDPR, the California protection, HIPPA, et cetera.

Can we scale easy? If one region let’s say the US is growing very rapidly, people are adopting our platform. Can we scale rapidly? Or are we going to start just paying a lot more money for our, our database licenses and, and things like that. Can we, can we somehow make it linear at the very, at the very minimum?

The other thing is will everybody have the capacity to understand what, transformation in SQL command is, et cetera? Can we help them along the way with either UI or intelligence recommendations? So, for example, if we have two accountants joining ClicData, right? The first accountant is very knowledgeable that he uploads his Excel or is his data from QuickBooks or, or, one of the financial packages. And then he starts doing some charts and graphics and dashboards. Can we somehow capture the intelligence of how we built that whether the way he visualized it or whether the way she manipulated the data in such a way that the next accountant that joins ClicData can benefit from that understanding without revealing obviously the underlying data.

Can we look at two Excel spreadsheets and determine if their DNA is very similar in such a way that I can use the same processes that I used on this Excel, knowing that an Excel is, is highly non schema fixed, type of structure, and that it could be in a different language for what we know, or not even having column titles.

And, but can we have a DNA footprint of that and apply, “Oh, somebody just uploaded and connected his data, which is very much like this other data. Can we recommend these type of cleansing and joining and visualizations to this data as we did for this one?”

So that is kind of one of the things that we’re working on avidly, because this is really truly the democratization of, in a sense, a certain level of data science, meaning at least in the data preparation area and data visualization, the best practices, if you will, of, of how to present and modify your data, that it becomes usable quickly, which a lot of our customers still, you know, struggle with, because that’s not their core business, obviously their core business is whatever, you know, that they’re building or providing services for. So that is definitely one area that we see a huge potential for us, again, because we are a SAS BI platform, we have all this data that we can, work with, as opposed to individual, you know, BI implementations, which are really disconnected from each other. They don’t see what other companies are doing or collecting that intelligence and providing that to others.

Curtis: Got it. I mean, that’s a super valuable problem to solve. Have you, have you found, I’m just curious and you know, if anything’s proprietary, you’re obviously, you know, you don’t share anything that you don’t want to, but how are you approaching that? Have you found it successful to be able to say, like you’re saying, like here are two different spreadsheets about we see patterns and therefore, you know, we bubble up to the user like, “Hey, you should shift this this way, this, this way.” Or maybe you do that automatically. You know, how are you doing that? That’s such a massive problem to try and solve.

Telmo: Yeah, no, absolutely. I think our biggest problem right now is, ClicData is kind of the, the little guy coming up. And one of the things that we need is more data. And interestingly enough, when, when you have such disparate industries, as we do as, as customers, it’s, it’s still hard to kind of cluster those footprints, if you will, in such a way that is statistically meaningful at this point in time. And as much as we, we want to advance that. So that’s been one of our biggest challenges.

One thing that has helped us quite a bit, though, and something that we actually are looking at is actually get again as, as with most machine learning algorithms to get humans to help out. So we have a lot of data scientists and customers in either through customers or externals that I’ve already done a lot of work with different tools and different data sets.

And, and that’s the next thing that we’re trying to see, “well, if we don’t have enough volume to start detecting these patterns in a statistically meaningful way, Can we help it with some human generated models that, that potentially could help us advance faster,” right? Until we’re better known, or we have a lot more customers that we can start making, making these, these assumptions a little bit more, more strict,

Nonetheless, we’re still in the middle of it. And it’s something that our engineers are kind of, attempting to do in the middle of all the other issues that they have to fix. It’s definitely one of our biggest investments in the next couple of years to see how we can really have an Excel being uploaded in any language and kind of say, “well, I think you want to do this” and, expecting. Right?

Curtis: Yeah. Yeah, that would be, that would be huge. And you did mention a couple of other things that you’re doing in addition to that to help this data preparation process like using   NLP, right? To, have better, queries, user queries, classification, these kinds of things. Can you talk a little bit about those techniques and how that also helps the process?

Telmo: Yeah, absolutely.

We’ve seen that we we’ve seen natural language processing. There’s actually quite a few tools out there. Some of them have been embedded in tools such as Tableau and Power BI, where the users can type their queries in English and in their, mother tongue and the dashboard or the visualization appear, or at least suggestions of visualization of data. And I think that has progressed a lot, in the last few years, but I also find it it’s been kind of, it’s been kind of a very niche and a very experimental thing.

I still don’t see it in full use in many cases yet, but I do feel that that is going to be great in terms of people stop using, you know, SQL and having to learn SQL as, as good a language as it is, or any other language, if SQL doesn’t cut it, any type of query technical language and start using, natural language to get their queries. I think that is one of the areas that needs to, to advance faster in my opinion. And I think it’s going to, I think just much like we’ve seen in other areas, I think it’s going to advance quite, quite fast if it hasn’t already, in certain areas.

The, the inverse of that is actually the, the system generating language, generating descriptive. I find that fascinating because it removes human bias, in a sense from people reading charts, you know. I’ve always had this great story, which is in the earliest phases of ClicData, and that was, I was going around, you know, basically a Silicon Valley looking for investors and saying, “Hey, we got this great idea, ClicData. You guys want to look at us, et cetera. And the joke was, you know, I would show our MRR, you know, the growth of recurring revenue. And they said, “well, that seems to be pretty slow.” And I go, and I would just shrink the chart like this and, you know, just shrink the horizontal axes, and they’ll go, “Oh yeah, that one, it looks much better now. Yeah, I know. Right?”

So this is the type of tricks that, unconsciously or consciously humans do to kind of fake their perception of the data. Right?

Pie charts, we talk about Stephen Few, a great data visualizer. I mean, I, I’m not going to go into all the examples how you can mislead the brain into thinking certain things just by looking at a chart that has been done in a certain way or in a certain color.

And there’s nothing more pure in the sense of a computer actually analyzing that statistically looking at the curve and saying, “you know, something happened here. This has gone down 60%.” And putting that in the language that is easily understandable, and generating that.

So natural language generation to me is something that is of extreme value, not to mention the fact that it can now open the doors for people with visual imparities, et cetera, or even, you know, to, to facilitate the fact that you don’t have to get your eyes off the road as you’re driving and hearing it being spoken to you, your sales for the last week of – I mean, this is stuff that you see in movies that are, you know, are becoming reality, and, and we like that, but that is the kind of stuff that is in the works.

There’s nothing. I’m not saying that there’s nothing new there. There’s a lot of stuff still to be done there, but it’s definitely not groundbreaking.

Where I really like is better data connectivity and performance via predictive. This stuff of understanding what you’re connecting to and having this, does the metadata come alive. And this DNA thing that we’ve talked before. This understanding, giving the platform the understanding by looking at things such as Wikipedia, by looking at theopendata.orgs of this world and scanning this data and saying, “well, I’m fairly sure I know what this data is.” You’re basically, you’ve uploaded some sales data for this product, which is obviously in the retail industry, or in this sector, et cetera. And by the way, I can pull out all the logos for you. I can pull out additional complementary information for you from free data sources out there. And enriching that data automatically without even the user having to spend time.

That stuff to me is really a killer feature for me. Because, it’s really taking advantage of everything that potentially machine lang . . . machine learning has to offer, which is giving the power of understanding, understanding in a, in a very, quoted way, the fact that this user is trying to use this data and he made links to other data sets automatically, even data sets that don’t exist internally, but outside available on the internet. Right?

But this requires something that today, and this is the challenge with all of them, by the way, is the fact that they’re all very domain specific. The natural language processing is easy to say, “give me the sales of last year compared to this year.” And boom, the power BI comes up with that. And it’s easy because there’s probably just one table or one data set called “sales” in there. Right?

So that’s easy to do, but what happens if you’re doing this for a multitude of customers? What if sales no longer really means sales, but different, there’s semantics involved, or whether you’re doing this, for example, on ClicData, where we host a variety of data sets, and again, because of privacy and so forth, we’re also limited in terms of what we can expose and understand about them.

Does that domain contextual info of that, of the dataset of our customer, does that come into play? Yes, it does. It now opens up the doors. It goes, what are they talking about? Right? Like sales, what sales? There’s too many sales here. And the quality of the content as well, again, in a very highly implemented data silo, a company can definitely clean their data much, much better than, than a normal company would, that does not have the skills or the resources to do it. So, using these machine learning algorithms, there is going to be very, they will behave very poorly, mostly because the data is just not up to par.

And then let’s not even talk about language, right? I mean, with, with the different languages and everything else that we have, you upload a spreadsheet of Japanese brands versus one in English, one in Dutch, and you’re trying to make sense of it all. And then you mix and match. This is really tough and advanced stuff and I think that’s why it excites me at least, because it’s so challenging to get all these things in place with all these variables.

Curtis: So, so maybe a point here on, this thing that you’re most excited about, obviously there’s a lot of ground to cover. What do you feel like you have been the most successful at in terms of creating something with this predictive usage analytics or, whatever it is that’s really generated the best outcomes in your mind that you’ve been able to do based on all your research and all your work?

Telmo: Interestingly enough, the best stuff that we’ve come up in, in terms of our predictive use of it has been the predictive use of our, platform. It’s indirectly affecting our customers, but it’s not visible to our customers. It is really the understanding of, when people use the platform around the world. When we can scale up, when we can scale down, where do we need improvements in code and, and all those internal processes that allow us to be highly, highly functional, highly scalable, and yet at the same time, highly efficient with our resources.

So it’s been something internal that we’ve been using. It’s really, all the algorithms we’ve put around understanding what are the periods that people use it? What type of queries do they use? What type of data? What are the data size roadblocks? I mean a lot about the data and how it’s stored in the database.

We use Microsoft Azure, for all our data across the world. And, it is very important, both from a cost perspective and both from a performance perspective that we can continue to optimize those items in such a way that it becomes a viable option for many of our customers.

So, it’s all the data that we’ve been able to collect and analyze in the usage of our platform that has been the most useful for us at this point in time. Nothing yet customer-facing that the users can say, “Oh,y know, there’s this new feature NLP or NLG natural language generation or something.” We don’t have that yet. And, for a variety of reasons, again, one of them being the fact that our customers are very diverse and that’s really not the core of their interest at this point in time.

But, but it’s rather the internal use of all those items. You know, case in point, in terms of identifying optimal settings by comparing different uses of database across all our customers. We can compare 50 databases and monitor them independently with sample groups and find out which ones are reacting better to the codes that we have to the application that we have in place. And kind of start, determining the settings and configurations of each one, and optimize them and then repeating the process and continue to do that iteratively until we find the best configuration for every single database that we have, while at the same time, allowing for the fact that there may be some customers that have no SQL data vs very structured data and those parameters need to be changed accordingly as well. So that has been probably one of our biggest success factors.

Curtis: That’s great. And like you said, I mean that that has real benefits though, right? I mean, I’m assuming that saves a bunch of costs. And also when they’re doing queries they’re getting returns faster. Which when you’re investigating data return time is really important because if you’re waiting a couple of minutes every time you run a query you’re not going to get anything done. Those are the kinds of gains we’re talking about.

Telmo: Absolutely. Absolutely. I mean, yeah. See if you can’t, if you can’t return a dashboard under a few seconds, I, I’m not sure we’re going to keep those customers for a long time. Right? So, and obviously the same way. I mean, we love Microsoft and, and other cloud storage and technologies, but they’re expensive. There are a lot of servers around the world that we keep and we need to optimize that cost as most companies do. It’s important to always add that cost ratio benefit, right?

Curtis: Is there anything that you had wanted to cover or think is really interesting that you wanted to cover?

Telmo: No, I just, it, you know, again, we live in an interesting time where, you know, everybody talks about data science, data viz, data BI, big data, embedded analytics, democratizing BI. I mean, you name the term. It’s probably been, you know, invented, created, et cetera, but ultimately, when you’re a business and all you want to do is wake up in the morning and know how did I do yesterday? Or how did my team do?

I find it interesting even after, you know, 30, 40 years of database and data warehouse technologies being in the market, BI being a terminology that we still have this challenge, still today. And I think it’s because we change the rules of the game constantly. Right? Just as we were getting to a certain point to say, “Oh yeah, everybody’s going on premise. And you go, no, everybody goes on cloud.” Okay. That changes the whole thing again. Right?

So it’s, it’s interesting to see that it’s a constantly evolving channel, you know, and market and sector. I think there’s a lot of confusion as to what machine learning is and what artificial intelligence is big data and all these IOT, these, these terminologies, I think, you know, it’s time, I think as well, and it’s through podcasts like yours, that through education that I think is important for people to start learning for themselves and, and to understand more about what, what those things mean, both from, from a knowledge perspective, but also from a practical perspective, what can you do for my business?

And that’s ultimately why we came down it’s about helping businesses. It’s not about building technologies and visualization just because. There’s a lot of cool tools that do that. We, we do things with a purpose, which is to facilitate as much as possible your data acquisitions all the way to making decisions on that data. Right. Yeah, that’s it.

Curtis: That’s great. Without a decision, nothing matters. Right? Without action, nothing is going to get done. So, and by the way, when you finally solve the data preparation problem, call me ’cause I want to have you back on the show and like, tell me how you did it.

Telmo: I’ll keep you updated, Curtis.

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

Telmo Silva created ClicData, an end-to-end SAAS BI platform, which as he describes, is the little guy coming up in the BI platform world. He talks about how his company was started, where it’s been, and where it’s going with cutting-edge R&D. He also offers additional thoughts on the role of data in the business world today.

Telmo Silva: That’s the next thing that we’re trying to see, “well, if we don’t have enough volume to start detecting these patterns in a statistically meaningful way, can we help it with some human-generated models that potentially could help us advance faster?”

Ginette: I’m Ginette,

Curtis: and I’m Curtis,

Ginette: and you are listening to Data Crunch,

Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world.

Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company.

Curtis: So, Telmo, thank you again for joining me. Everybody, we have Telmo Silva here from ClicData to talk to us about a host of interesting things. I just want to start off with getting an introduction from you and who you are, where you’ve been, and we can kick it off.

Telmo: Well, thanks, thanks, thank you for having me here, Curtis. I appreciate that. Yeah. So again, my name is Telmo Silva. I am the CEO and founder of ClicData. For those of you that have not come across ClicData, we, we claim to be, and we are hoping to be the number one cloud-based business intelligence end-to-end platform.

And the way we envisioned this platform, many years ago, as I was dealing with, with pharmaceutical marketing sales and medical systems was the fact that I was constantly being faced with the challenge of implementing large-scale business intelligence data warehouse systems only to go back to the users and to the affiliates and to the different countries to see them export things into your favorite spreadsheet program. And that was frustrating.

And as such, you know, I grabbed a bunch of data specialists from large retail companies and, experienced data scientists, which back then there was not even that terminology of data scientists developers.net and so forth. And I put them all in one room and I said, listen, I have this vision. And I’d like to start something called ClicData. And this is a vision.

And, you know, 10 years later here we are. We launched it about five years ago. A truly cloud platform, and our biggest differentiator really in our focus is to say as much as we can make data beautiful, and there’s some beautiful visualization tools out there and even components, the problem has always been the data, you know, how to get to the data, how to massage the data, how to cleanse the data and so forth. And it’s 70% of our time is just getting those . . . the database. Everybody can make pretty charts to different tastes and different, obviously, skill, but nonetheless, a lot of the problem is getting to the data.

So we wanted to kind of really bring a BI platform end-to-end to the masses. And that’s what ClicData is about. 

I’ve been doing that for a long time, and, you know, along the way, we start developing other interesting side things, which, you know, obviously in these last 10 years, things such as machine learning, artificial intelligence, data science, all came to flourish, and we’re, trying to navigate that sea of new technology and new areas that we need to potentially get into as time goes.

 Curtis: You’re mentioning data preparation, and you know, making sure the data is clean and everything, which as anyone in the data field knows is absolutely the worst part of your job. And it’s the biggest part of your job.

But I’m curious now you you’ve had a business in this for a long time. One of the hardest things that I’ve seen from a data science perspective is trying to describe the complexity and the difficulty of data preparation to people that are not data people. Right? So they’re like, why. Why can’t you do this faster? Like look because the data is a total mess, and it’s hard, right? So I’m curious if you have, like, how do you present that to people in a way that they get the problem? If they’re not in data. That may be a good perspective for a lot of people.

Telmo: Yeah. I think that’s a great topic and, and definitely a challenge of ours. Right? I mean, if you look at the history of, of how people have done reporting, and you would go, “Oh, you know, we have these local database, let’s just, you know, plug in, you know, some tool crystal reports or whatever, you just write against it. And we do all the querying and all that stuff.” And as, as companies are moving to the cloud and specifically different cloud systems, picking the best of breed for your ERP, for your CRM, for, for your transactional data, now, you know, we’ve introduced potentially we’ve solved a lot of problems by going to the cloud and potentially even facilitated the use of large scale computing systems. But now we have this problem that our data no longer resides in some such a thing that is easy to get to.

So how do you explain to a business owner that, that basically, it’s just interesting to know how many couches they’ve sold, that we have to hit an API that there’s a, a throttling of 100 rows per second. And by the way, that’s how we’re limited, right? That’s we cannot go faster than what your vendor is going to give us the data. Right?

How do we explain to somebody that has a marketing agency that because Facebook got hit with, you know, a bunch of problems related to Cambridge Analytics and everything else that now they’ve locked down their API is so tight that every change we. Every new data element that we want to make when we need to obtain approval from them as vendors to to get that data on their behalf. Right? So it becomes very quickly a technical discussion of which many of our customers potentially and many of the business users are not ready for.

At the same time, we also see a growth and, and, you know, the age of the folks that would print their emails because, you know, they didn’t quite understand the concept of email.

We had also the age of the Skype users that started using Skype in businesses that did not allow for Skype. And now we’re in the age of people that potentially already know what Python is. Right? So I think there’s also a progression and of understanding of our customers and the folks that are, are seeing that data is no longer just a side effect of having a transactional system in place.

It’s actually a commodity. It’s something that they need to rely on and why not even commercialize. So they themselves will improve over time, in, in teaching themselves these types of technologies, but, you know, we use a variety of techniques to try to explain those difficulties to say, “yes, I understand that you’re trying to match this data set with that dataset, but by the way, there’s no link.”

Right?

So. You’re going to have to make some very good assumptions as to how you’re going to link those data sets. And, and you know, this, this is my dream initially was to build a product-only company. And as it turns out, what I’m doing these days is actually hiring more data scientists and more customer data experts that interact with the customers that explain and work with the customers.

So this thing of being just a tool and just saying, I’m going to buy this tool and it’s going to solve my problem, that may cut it for some, but other people do need more training, and they need our expertise as well in explaining to them why those things don’t happen overnight sometimes.

Curtis: Data is so varied and there’s so many use cases and so many systems that it’s almost impossible to have one piece of software that solves it all. Although, I’m curious. So, so ClicData, then, it’s a sort of a, a combination of technology and consulting is that fair where you, you use your technology to make the consulting you do easier essentially? And more effective?

Telmo: Yeah, I would like to think that, uh, we’re a product first company. We are developing a true BI platform, but we found that we get best results or rather our customers get better results, for some types of customers, if we have staff that is able to translate, you know, data, data issues, and visualization issues and security issues, in normal day-to-day language and having, an entire team, which we call our professional services team dedicated for that either in training or onboarding.

So it’s rather the other way around. But, but yeah, you know, some days I question whether, you know, it’s the tool helping the team, or is the team helping the tools sometimes.

Curtis: Yeah, fair. It’s interplay there for sure. And I see that in a lot of, actually, products that are related to data science is oftentimes you have to have this, this arm, that that is human based ’cause it’s just, there’s too much variability a lot of times, but could we dive into some of the capabilities that you’ve, you’ve, I mean, you’ve done this for 10 years, so you’ve built stuff into this tool to solve I’m assuming a lot of, sort of, common problems, but also, use using AI and ML to predictive analytics, for example, to overcome some, some issues. So I’d love to dive into what you’ve seen and done. 

Telmo: Yeah, absolutely. I mean, at its core, ClicData, what it allows you to do is basically, we offer a sleuth of connectors to different APIs and different systems. So we have over 60 connectors or so, and then we have these generic connectors, like our web service connector, that allows you to connect, to graph QL and, and, a variety of other, hundreds of other APIs that exist out there.

So that’s the first part that we found is how do we get that data that’s sitting either behind a firewall in a, in some kind of a database, or it sits on, you know, a cloud-based system or cloud storage, or what have you. And how do we kind of bring that in and kind of create that, that, you know, for lack of a better word, a, an ad hoc data warehouse.

And that was the second step that we had to decide. Did we go with a, a default model of a data warehouse where we have to sit with a customer and say, okay, what are your metrics? What are your dimensions? What are your hierarchies? What are your filters, et cetera? Or are we going to start going into more, you know, a new age of thinking, which is now let’s dump everything into some kind of data lake, call it again, whatever you have, but let’s, let’s build as many indexes and get some understanding of that data ourselves and try to come up with algorithms that facilitate the querying of that. And that’s the second piece is, is either allowing the customer to start massaging the data themselves, cleansing using the typical ETL tools that we are used to, either SQL or using our designer to start putting things in cleansing the data, adding enriching the data, joining the data, fusioning in the data, et cetera. But then in addition to that have a parallel function, which kind of starts understanding what our customers are doing with their own data.

And then obviously the third piece, which is, “okay, great. You have all these data sources connected. They’re all in our databases. How do we visualize that?” So a dashboard editor, which is, you know, non-OS specific that runs on any browser.

And then the final piece is, you know, there’s no point in having an amazing dashboard if you’re the only one looking at it. Right? So the distribution, the security of that, passing it, embedding it into your other systems, your portals, your websites, et cetera. Right? So that entire thing is the BI platform.

But when we initially started, we had a freemium, we had over 30,000 accounts running at the same time. And we did that, not because, you know, . . . It was a period of trial to understand (a), can we scale intelligently? That means how can we hold all these accounts, data in different regions, respecting data protection, regulations and laws such as GDPR, the California protection, HIPPA, et cetera.

Can we scale easy? If one region let’s say the US is growing very rapidly, people are adopting our platform. Can we scale rapidly? Or are we going to start just paying a lot more money for our, our database licenses and, and things like that. Can we, can we somehow make it linear at the very, at the very minimum?

The other thing is will everybody have the capacity to understand what, transformation in SQL command is, et cetera? Can we help them along the way with either UI or intelligence recommendations? So, for example, if we have two accountants joining ClicData, right? The first accountant is very knowledgeable that he uploads his Excel or is his data from QuickBooks or, or, one of the financial packages. And then he starts doing some charts and graphics and dashboards. Can we somehow capture the intelligence of how we built that whether the way he visualized it or whether the way she manipulated the data in such a way that the next accountant that joins ClicData can benefit from that understanding without revealing obviously the underlying data.

Can we look at two Excel spreadsheets and determine if their DNA is very similar in such a way that I can use the same processes that I used on this Excel, knowing that an Excel is, is highly non schema fixed, type of structure, and that it could be in a different language for what we know, or not even having column titles.

And, but can we have a DNA footprint of that and apply, “Oh, somebody just uploaded and connected his data, which is very much like this other data. Can we recommend these type of cleansing and joining and visualizations to this data as we did for this one?”

So that is kind of one of the things that we’re working on avidly, because this is really truly the democratization of, in a sense, a certain level of data science, meaning at least in the data preparation area and data visualization, the best practices, if you will, of, of how to present and modify your data, that it becomes usable quickly, which a lot of our customers still, you know, struggle with, because that’s not their core business, obviously their core business is whatever, you know, that they’re building or providing services for. So that is definitely one area that we see a huge potential for us, again, because we are a SAS BI platform, we have all this data that we can, work with, as opposed to individual, you know, BI implementations, which are really disconnected from each other. They don’t see what other companies are doing or collecting that intelligence and providing that to others.

Curtis: Got it. I mean, that’s a super valuable problem to solve. Have you, have you found, I’m just curious and you know, if anything’s proprietary, you’re obviously, you know, you don’t share anything that you don’t want to, but how are you approaching that? Have you found it successful to be able to say, like you’re saying, like here are two different spreadsheets about we see patterns and therefore, you know, we bubble up to the user like, “Hey, you should shift this this way, this, this way.” Or maybe you do that automatically. You know, how are you doing that? That’s such a massive problem to try and solve.

Telmo: Yeah, no, absolutely. I think our biggest problem right now is, ClicData is kind of the, the little guy coming up. And one of the things that we need is more data. And interestingly enough, when, when you have such disparate industries, as we do as, as customers, it’s, it’s still hard to kind of cluster those footprints, if you will, in such a way that is statistically meaningful at this point in time. And as much as we, we want to advance that. So that’s been one of our biggest challenges.

One thing that has helped us quite a bit, though, and something that we actually are looking at is actually get again as, as with most machine learning algorithms to get humans to help out. So we have a lot of data scientists and customers in either through customers or externals that I’ve already done a lot of work with different tools and different data sets.

And, and that’s the next thing that we’re trying to see, “well, if we don’t have enough volume to start detecting these patterns in a statistically meaningful way, Can we help it with some human generated models that, that potentially could help us advance faster,” right? Until we’re better known, or we have a lot more customers that we can start making, making these, these assumptions a little bit more, more strict,

Nonetheless, we’re still in the middle of it. And it’s something that our engineers are kind of, attempting to do in the middle of all the other issues that they have to fix. It’s definitely one of our biggest investments in the next couple of years to see how we can really have an Excel being uploaded in any language and kind of say, “well, I think you want to do this” and, expecting. Right?

Curtis: Yeah. Yeah, that would be, that would be huge. And you did mention a couple of other things that you’re doing in addition to that to help this data preparation process like using   NLP, right? To, have better, queries, user queries, classification, these kinds of things. Can you talk a little bit about those techniques and how that also helps the process?

Telmo: Yeah, absolutely.

We’ve seen that we we’ve seen natural language processing. There’s actually quite a few tools out there. Some of them have been embedded in tools such as Tableau and Power BI, where the users can type their queries in English and in their, mother tongue and the dashboard or the visualization appear, or at least suggestions of visualization of data. And I think that has progressed a lot, in the last few years, but I also find it it’s been kind of, it’s been kind of a very niche and a very experimental thing.

I still don’t see it in full use in many cases yet, but I do feel that that is going to be great in terms of people stop using, you know, SQL and having to learn SQL as, as good a language as it is, or any other language, if SQL doesn’t cut it, any type of query technical language and start using, natural language to get their queries. I think that is one of the areas that needs to, to advance faster in my opinion. And I think it’s going to, I think just much like we’ve seen in other areas, I think it’s going to advance quite, quite fast if it hasn’t already, in certain areas.

The, the inverse of that is actually the, the system generating language, generating descriptive. I find that fascinating because it removes human bias, in a sense from people reading charts, you know. I’ve always had this great story, which is in the earliest phases of ClicData, and that was, I was going around, you know, basically a Silicon Valley looking for investors and saying, “Hey, we got this great idea, ClicData. You guys want to look at us, et cetera. And the joke was, you know, I would show our MRR, you know, the growth of recurring revenue. And they said, “well, that seems to be pretty slow.” And I go, and I would just shrink the chart like this and, you know, just shrink the horizontal axes, and they’ll go, “Oh yeah, that one, it looks much better now. Yeah, I know. Right?”

So this is the type of tricks that, unconsciously or consciously humans do to kind of fake their perception of the data. Right?

Pie charts, we talk about Stephen Few, a great data visualizer. I mean, I, I’m not going to go into all the examples how you can mislead the brain into thinking certain things just by looking at a chart that has been done in a certain way or in a certain color.

And there’s nothing more pure in the sense of a computer actually analyzing that statistically looking at the curve and saying, “you know, something happened here. This has gone down 60%.” And putting that in the language that is easily understandable, and generating that.

So natural language generation to me is something that is of extreme value, not to mention the fact that it can now open the doors for people with visual imparities, et cetera, or even, you know, to, to facilitate the fact that you don’t have to get your eyes off the road as you’re driving and hearing it being spoken to you, your sales for the last week of – I mean, this is stuff that you see in movies that are, you know, are becoming reality, and, and we like that, but that is the kind of stuff that is in the works.

There’s nothing. I’m not saying that there’s nothing new there. There’s a lot of stuff still to be done there, but it’s definitely not groundbreaking.

Where I really like is better data connectivity and performance via predictive. This stuff of understanding what you’re connecting to and having this, does the metadata come alive. And this DNA thing that we’ve talked before. This understanding, giving the platform the understanding by looking at things such as Wikipedia, by looking at theopendata.orgs of this world and scanning this data and saying, “well, I’m fairly sure I know what this data is.” You’re basically, you’ve uploaded some sales data for this product, which is obviously in the retail industry, or in this sector, et cetera. And by the way, I can pull out all the logos for you. I can pull out additional complementary information for you from free data sources out there. And enriching that data automatically without even the user having to spend time.

That stuff to me is really a killer feature for me. Because, it’s really taking advantage of everything that potentially machine lang . . . machine learning has to offer, which is giving the power of understanding, understanding in a, in a very, quoted way, the fact that this user is trying to use this data and he made links to other data sets automatically, even data sets that don’t exist internally, but outside available on the internet. Right?

But this requires something that today, and this is the challenge with all of them, by the way, is the fact that they’re all very domain specific. The natural language processing is easy to say, “give me the sales of last year compared to this year.” And boom, the power BI comes up with that. And it’s easy because there’s probably just one table or one data set called “sales” in there. Right?

So that’s easy to do, but what happens if you’re doing this for a multitude of customers? What if sales no longer really means sales, but different, there’s semantics involved, or whether you’re doing this, for example, on ClicData, where we host a variety of data sets, and again, because of privacy and so forth, we’re also limited in terms of what we can expose and understand about them.

Does that domain contextual info of that, of the dataset of our customer, does that come into play? Yes, it does. It now opens up the doors. It goes, what are they talking about? Right? Like sales, what sales? There’s too many sales here. And the quality of the content as well, again, in a very highly implemented data silo, a company can definitely clean their data much, much better than, than a normal company would, that does not have the skills or the resources to do it. So, using these machine learning algorithms, there is going to be very, they will behave very poorly, mostly because the data is just not up to par.

And then let’s not even talk about language, right? I mean, with, with the different languages and everything else that we have, you upload a spreadsheet of Japanese brands versus one in English, one in Dutch, and you’re trying to make sense of it all. And then you mix and match. This is really tough and advanced stuff and I think that’s why it excites me at least, because it’s so challenging to get all these things in place with all these variables.

Curtis: So, so maybe a point here on, this thing that you’re most excited about, obviously there’s a lot of ground to cover. What do you feel like you have been the most successful at in terms of creating something with this predictive usage analytics or, whatever it is that’s really generated the best outcomes in your mind that you’ve been able to do based on all your research and all your work?

Telmo: Interestingly enough, the best stuff that we’ve come up in, in terms of our predictive use of it has been the predictive use of our, platform. It’s indirectly affecting our customers, but it’s not visible to our customers. It is really the understanding of, when people use the platform around the world. When we can scale up, when we can scale down, where do we need improvements in code and, and all those internal processes that allow us to be highly, highly functional, highly scalable, and yet at the same time, highly efficient with our resources.

So it’s been something internal that we’ve been using. It’s really, all the algorithms we’ve put around understanding what are the periods that people use it? What type of queries do they use? What type of data? What are the data size roadblocks? I mean a lot about the data and how it’s stored in the database.

We use Microsoft Azure, for all our data across the world. And, it is very important, both from a cost perspective and both from a performance perspective that we can continue to optimize those items in such a way that it becomes a viable option for many of our customers.

So, it’s all the data that we’ve been able to collect and analyze in the usage of our platform that has been the most useful for us at this point in time. Nothing yet customer-facing that the users can say, “Oh,y know, there’s this new feature NLP or NLG natural language generation or something.” We don’t have that yet. And, for a variety of reasons, again, one of them being the fact that our customers are very diverse and that’s really not the core of their interest at this point in time.

But, but it’s rather the internal use of all those items. You know, case in point, in terms of identifying optimal settings by comparing different uses of database across all our customers. We can compare 50 databases and monitor them independently with sample groups and find out which ones are reacting better to the codes that we have to the application that we have in place. And kind of start, determining the settings and configurations of each one, and optimize them and then repeating the process and continue to do that iteratively until we find the best configuration for every single database that we have, while at the same time, allowing for the fact that there may be some customers that have no SQL data vs very structured data and those parameters need to be changed accordingly as well. So that has been probably one of our biggest success factors.

Curtis: That’s great. And like you said, I mean that that has real benefits though, right? I mean, I’m assuming that saves a bunch of costs. And also when they’re doing queries they’re getting returns faster. Which when you’re investigating data return time is really important because if you’re waiting a couple of minutes every time you run a query you’re not going to get anything done. Those are the kinds of gains we’re talking about.

Telmo: Absolutely. Absolutely. I mean, yeah. See if you can’t, if you can’t return a dashboard under a few seconds, I, I’m not sure we’re going to keep those customers for a long time. Right? So, and obviously the same way. I mean, we love Microsoft and, and other cloud storage and technologies, but they’re expensive. There are a lot of servers around the world that we keep and we need to optimize that cost as most companies do. It’s important to always add that cost ratio benefit, right?

Curtis: Is there anything that you had wanted to cover or think is really interesting that you wanted to cover?

Telmo: No, I just, it, you know, again, we live in an interesting time where, you know, everybody talks about data science, data viz, data BI, big data, embedded analytics, democratizing BI. I mean, you name the term. It’s probably been, you know, invented, created, et cetera, but ultimately, when you’re a business and all you want to do is wake up in the morning and know how did I do yesterday? Or how did my team do?

I find it interesting even after, you know, 30, 40 years of database and data warehouse technologies being in the market, BI being a terminology that we still have this challenge, still today. And I think it’s because we change the rules of the game constantly. Right? Just as we were getting to a certain point to say, “Oh yeah, everybody’s going on premise. And you go, no, everybody goes on cloud.” Okay. That changes the whole thing again. Right?

So it’s, it’s interesting to see that it’s a constantly evolving channel, you know, and market and sector. I think there’s a lot of confusion as to what machine learning is and what artificial intelligence is big data and all these IOT, these, these terminologies, I think, you know, it’s time, I think as well, and it’s through podcasts like yours, that through education that I think is important for people to start learning for themselves and, and to understand more about what, what those things mean, both from, from a knowledge perspective, but also from a practical perspective, what can you do for my business?

And that’s ultimately why we came down it’s about helping businesses. It’s not about building technologies and visualization just because. There’s a lot of cool tools that do that. We, we do things with a purpose, which is to facilitate as much as possible your data acquisitions all the way to making decisions on that data. Right. Yeah, that’s it.

Curtis: That’s great. Without a decision, nothing matters. Right? Without action, nothing is going to get done. So, and by the way, when you finally solve the data preparation problem, call me ’cause I want to have you back on the show and like, tell me how you did it.

Telmo: I’ll keep you updated, Curtis.

 

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/