We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date

Syncfusion Dashboard: Hive Datasource fetch very slow

HiI just setted up Syncfusion Dashboard platform and imported some test data via Integration platform in Hadoop (Avro files).Then I put data from Hadoop to Hive tables (convert Avro to table), in total I just have 400 rows.Then I used this Hive as Datasource for Grid Dashboard.When I try to fetch data it works very slowly.My computer has 64 gb RAM and SSDs but fetching 400 rows of data takes 45 seconds..Can somebody point me how to figure out it? All settings in Syncfusion are by default.Thanks! 

3 Replies

NK Nandhini K Syncfusion Team July 3, 2017 12:31 PM UTC

Hi  Bochkov, 
 
Thanks for using  Syncfusion products. 
 
Please find the response for your query as below, 
 
 
Query  
Response 
When I try to fetch data it works very slowly.My computer has 64 gb RAM and SSDs but fetching 400 rows of data takes 45 seconds..Can somebody point me how to figure out it? 
Reason for Slow Performance: 
 
  • As the data processing using Hive Server2 involves MapReduce process with multiple disk read/write operations, it will take considerable time for both small and large data set.
 
Please find a sample query that will be generated by the dashboard to fetch data by hive server and bind the data in the selected widgets in the dashboard as follows, It is a sample query by adding the “contactname” column from the table “ Customer2” into the selected widget. 
 
SELECT Sub_Table.Grid_Column_0 AS Grid_Column_0  FROM (SELECT customers2.contactname AS Grid_Column_0 ,ROW_NUMBER( ) OVER(ORDER BY customers2.contactname ASC) AS RowIndexColumn FROM default.customers2 AS customers2 GROUP BY customers2.contactname) Sub_Table WHERE RowIndexColumn BETWEEN 1 AND 200; 
 
 
Metrics: 
  • Please find the metrics of the count and above mentioned query in both Hive Server2 and in Spark SQL.
                   
Query 
Hive Server2 
Spark SQL 
Select query with groupBy and orderBy elements in the table created using avro file 
66 seconds 
5 seconds 
Count query  
30 seconds 
0.2 seconds 
 
 
Recommended Solution: 
As Hive Server2(Map Reduce) is well suited for batch processing with large data set, We recommend you to use Spark SQL data source for near real time analytics such as dashboard visualization. Because Spark SQL process data in-memory to avoid multiple disk I/O operations. 
 
  • Tables created under the Hive can also be accessed from “Spark SQL” in Syncfusion distribution as both uses same meta store database.
  • So you can use the “Spark SQL” connection type in Syncfusion Dashboard platform instead of “Hive”.
 
 
Regards, 
Nandhini K.


IB Ilya Bo July 4, 2017 09:18 AM UTC

Nandhini, thanks for your answer!

One more thing I would like to ask:

When I try to create Spark SQL data source I don't see my tables.

I created tables in several ways:

  1. I created test table by attached sample (scala).
  2. Also I created it by using AvroSerDe (hql): 
But when I use Hive data source I see them. What is the problem?

Thanks!



Attachment: AvroFileSchema1089509997_df60505f.zip


DG Dhivyabharathi Govindaraj Syncfusion Team July 5, 2017 12:22 PM UTC

Hi Ilya, 
 
We had created a new support incident under your Direct Trac account since the reported query is considered as an issue. Please follow the link below to access your account.  
 
Regards, 
Dhivya 


Loader.
Up arrow icon