This used to be a big hard drive

Small Data

k-rad.. K  From kilobyte.. A thousand.. A thousand times Rad.. Back when a 40MB hard drive was perhaps average or maybe around the time of those big 1.6GB hard drives, it meant “really cool”.. Maybe today it would ex-rad or p-rad. Anyway. There’s a reason for this trip down BBS or mIRC memory lane.. We’ll get there.

kCura

I’m sure my readers are really just me on my tablet, me on my phone and maybe my mom. But I like to think of them as Database professionals from across industries and job roles. You may or may not have heard of kCura, if you aren’t in eDiscovery.

In a nutshell – they are a really neat software company in the hot (see what I did there… I like data.. it’s hot to me) space of eDiscovery. A space that is growing in many ways.

kCura makes Relativity, and they just made some announcements around Relativity 9 and some features coming out there. Relativity is a platform that had humble beginnings and has grown to becoming a consistently sought after eDiscovery platform that can handle the full suite of end-to-end eDiscovery workload thrown at it. Great company run by some great folks, staffed with a really fun group of people. In fact, I’m writing this from day two of their annual user conference, Relativity Fest. Reminds me a lot of my SQL family. Over 1,000 people here who interact with the product in different roles. Great sessions, lots of friendships and a really good vibe.

Why The Talk of Kilobytes?

Think about what eDiscovery involves. While there are many shapes, sizes and reasons – it all revolves around data. Unstructured, multi-sourced data. In lawsuits, in compliance issues, in tracking internal audit information. E-mails, Spreadsheets, Documents, Powerpoint Presentations, etc. These tools need to quickly process a lot of data from a lot of sources and then make it available to search and review, quickly. In legal matters this data needs to be quickly assessed to see if it is the small needle in the haystack of “relevant” or “responsive” (to the legal matter at hand) data or if it is in the larger pool of non-responsive data like those e-mails you wrote to your spouse about dinner. Relativity, and the tools in the space take that data in and do a lot to it, then reviewers – sometimes lots of them – do a lot of crazy searches trying to find the relevant and responsive data.

We live in a data world. Think about a corporation doing a lot of work in documents, e-mails, presentations, spreadsheets, etc. That’s a lot of data throughout their network. As the tools for eDiscovery have grown so has the data we use in our offices which are actually finally starting to look a little more paperless. As Andrew Sieja, CEO and founder of kCura, talked in his keynote about how much we store in photos and music simply because we can, he drove home the point that the new tools are allowing more data to be stored. The bounds of how much data comes into eDiscovery shops reminded me of that quote from the Field of Dreams movie.

If you build it, they will come.

Well, kCura is building it.

Relativity stores all of it’s data in SQL Server (today pre verson 9). That makes sense for a lot of reasons. And it scales well vertically. But for this type of data? For a large matter (kCura said their biggest is a single case workspace with about 188,000,000 documents!) – that’s a LOT of data. I hate the term Big Data, but they use that here and it works well, I think.

They’ve done a few things right with some new architectural decisions. It may not be all built as they expand the offering into more areas and make more seamless transitions. But they are building it, alright.

What Did They Do?

I could write paragraphs here too. Go read about kCura’s Relativity Data Grid on their site. In a nutshell they fused a Relational Database for what works great in relational databases (volatile data, lookup data, user info, etc)  and a NoSQL data structure for what may work better in one of those technologies (Data that is mostly inserted and then stays put, data that needs to be searched in extensive and dynamic ways, data that reaches into the hundreds of millions of rows of large extracted text data).

So they looked at where their size and performance concerns were growing in their environment and it matched with some of the tables and types of data that maybe don’t make sense at large volumes in a relational engine. And they realized that t was time to look at something other than relational to store some of their data. They worked with some folks like my friends at Brent Ozar Unlimited to figure out which option made sense and verify their understandings – and they went down a path.

A really cool path. They didn’t throw the baby out with the bath water. They didn’t say “relational databases suck, let’s throw all this work and time out.” Instead they looked to use what they have and only made changes where necessary. They worked on some changes and now V9 has what is the start of a great addition to their future road map.

DataGrid is built on elasticsearch, an open source, distributed approach to dealing with data. This data scales horizontally (sharding) onto commodity based servers which can be scaled on demand and relatively seamlessly. There is built n redundancy and availability through the multi-node, distributed approach. It is optimized for searching text, a real important part of Relativity.

They are working on putting what needs to be in this format into this format, leaving the rest in SQL, which now has more room to breathe and is a lot happier doing its relational “thing”.

They aren’t alone in this hybrid approach. It is happening much more frequently, but it is still too rare. I applaud them for this decision. At Straight Path Solutions, we support quite a few Relativity customers and know about their challenges. As this gains stability and adoption, we see a real use case here for some of our bigger customers with larger workspaces in Relativity. I can’t wait to see if one of our larger customers would be a good candidate for the early adopter program.

By The Way -a Vendor Doing it Right

I complain a lot about software vendors. Not these guys. Not too much anyway. They do a few things really well here and I wanted to call that out here:

  • They give their customers insight into what they should do in SQL.
  • They listen to their user community and talk to their customers a lot.
  • They reach out to outside expertise like us or Brent or others when needed.
  • They care about and harp on maintenance, recoverability and all of the things that most vendors ignore completely. While they don’t own or manage a customer environment, they do a heck of a job trying to explain the importance of these things to their clients.
  • They throw a really great conference every year and interact seriously with their core customer and most vocal customers.
  • Their employees love working for them. I talked to a lot of them last year and this year so far at Fest. They feel empowered and equipped to solve the problems at hand.
  • They even provide best practice documentation on SQL Server configurations, backup and recovery best practices and other important DBA related items to their customers.

It’s great to see. They care and they want to do better. I wish more software vendors took this model and took criticism and suggestions half as well as they do. They aren’t perfect, but they are a lot better than many ISVs who make software that runs on SQL. I applaud their support and infrastructure engineering teams here as well.

If you are in the market for an eDsicovery platform – you should check them out. I love working with their customers and them.

If you already use Relativity? Reach out and talk to us. We love this space and have worked with quite a few kCura customers in optimizing their environments with our SQL Server Health Assessments or Remote DBA Services.

Share This