Microsoft has recently announced a new, cloud-based data repository. Named Microsoft Research Open Data, it promises to provide a new way for developers to work with the truly massive amounts of data that power many modern projects.
Using FAIR (findable, accessible, interoperable, and reusable) data principles, Microsoft’s goal is to make it easy to store, share, and work with large data sets. Microsoft researchers and collaborators will be able to upload data sets to the service and share relevant information and tools with one another with an emphasis on being able to reproduce results. The ramifications of such a project are vast, as it not only makes it easier for data to reach those who could make use of it, but provides opportunities for data to be refined and potential flaws to be identified and fixed.
Shared repositories could well be the next great advancement in managing large data sets. Working with data is so important that Syncfusion has published a number of e-books dedicated to it, including Hadoop Succinctly, Hive Succinctly, and HBase Succinctly. Indeed, Syncfusion has two products devoted solely to managing significant data sets: the Big Data Platform and the Data Integration Platform. Shared repositories like Microsoft Research Open Data can only make it easier for developers to engage with data, and we’re excited to see the amazing things that will come from a well-managed approach to data.
Are shared repositories something that would help you in working with data? What tools do you want to see? Let us know in the comments below.