We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy.
Unfortunately, activation email could not send to your email. Please try again.

Scheduling, Injesting data & Hardware

Thread ID:

Created:

Updated:

Platform:

Replies:

120068 Aug 27,2015 04:04 PM Aug 28,2015 05:29 PM Big Data Platform 2
loading
Tags: General
issy
Asked On August 27, 2015 04:04 PM

Hi,

We are currently evaluating the different distributions of hadoop (cloudera, hortonworks). The documentation is quite sparse on the Syncfusion distribution. We are looking at replacing a bunch of workstations 4-5 that do data processing using Base SAS & Custom scripts.

All of the processing is automated including:

1. Collecting the raw files via FTP
2. Processing in SAS
3. Analyzing in SAS
4. Pushing into a DB
5. Generating reports


I have a few questions:

1. Is there a command line interface to get data into hadoop, that can be scheduled? 
2. Is there a way to schedule tasks on a regular basis?
3. In the hardware faq you mention 'new' hardware. Is this a requirement or can we  use our existing workstation. They are quite high spec with about 8tb on each with RAID
4. If you have multiple drives on a workstation, without raid configured. How would you configure hadoop to make use of all of the drives. What we want to do is have the OS drive use SSD, and the data drives be normal large drives.
5. Would it work with an external drive bay?

issy
Replied On August 27, 2015 04:07 PM

Also, would we be able to access Hadoop from non windows clients? I see the studio is windows only. We have a mix of Mac, Windows, Linux clients.

Daniel Jebaraj [Syncfusion]
Replied On August 28, 2015 05:29 PM

Hi Issy,

Thank you for your interest in the Syncfusion Big Data Platform. 
 

S.No

Query

Response

1

Is there a command line interface to get data into hadoop that can be scheduled?  

We have several options to achieve this.


·       By using Java, we can transfer data from FTP to HDFS directly. Java program can be easily scheduled on regular basis using Oozie. 

·       If files are accumulated through streams of activity (such as logging), Flume will be a good choice. We have a special implementation of Flume that we can provide.

·       Alternatively, If files from FTP are collected and stored in local system, below Hadoop command can be used for copying data from local to HDFS in command line interface. 


Hadoop Command line interface directory : C:\Syncfusion\BigDataSDK\<version>\SDK\Hadoop\bin 


hdfs dfs –copyFromLocal <local_ file_ location> <target_hdfs_location> 

2

Is there a way to schedule tasks on a regular basis? 

Yes. Hadoop tasks can be scheduled using Oozie on a regular basis. Please refer below link to learn in detail about Oozie. 

http://oozie.apache.org/

 

We have provided support for Oozie in our platform.

http://helpbdp.syncfusion.com/bigdata/big-data-studio/oozie

3

In the hardware FAQ you mention 'new' hardware. Is this a requirement or can we use our existing workstation. They are quite high spec with about 8tb on each with RAID

You can use your existing workstation. HDFS clusters do not benefit from using RAID for data node storage. HDFS handles replication between nodes by itself.

 

Hence it is not recommended to use RAID on any of data nodes or client machine for the requirement of forming Hadoop cluster.  But RAID can be used for name nodes.

 

Please refer following UG link for forming cluster. 

http://helpbdp.syncfusion.com/bigdata/cluster-manager/cluster-creation 

4

How would you configure Hadoop to make use of all of the drives? What we want to do is have the OS drive use SSD, and the data drives be normal large drives. 

By default, with Syncfusion cluster, data node will make use of all fixed type drives of a machine.


Hadoop data nodes can be configured to restrict drives, by changing dfs.datanode.data.dir property of hdfs-site.xml file in advanced setting provided in our cluster manager application when creating cluster.

5

Would it work with an external drive bay? 

Yes. With Syncfusion cluster manager, data nodes can detect all fixed type external drives and we can use it for Hadoop HDFS storage. It is just the volume has to be a fixed volume (and not a transient volume).

6

Would we be able to access Hadoop from non-windows?

The Syncfusion Big Data Studio is a Windows only tool.


However accessing Syncfusion Hadoop distribution that is running on Windows through native command line interface from non-windows clients is supported just as with any other cluster. We can assist with this.

 

Accessing thrift services such as Spark and Hive thrift servers with our solution is platform independent. We can access it using Java Thrift API from non-windows clients as well.

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Thrift

 
 The provided data processing requirement can be done using Syncfusion Big Data Platform. Following is a simple prototype for the requirement.
·        FTP ->Hadoop  - Use Java program
·        Processing and analyzing data – Use Pig / Hive / Spark scripts in our Big Data Studio to process data in Hadoop.
·        Pushing into DB – Using Sqoop we can import and export processed data with SQL server, MYSQL or ORACLE.
·        Scheduling with OOZIE - All these tasks can be scheduled using Oozie on a regular basis.
·        Spark also comes with a full machine learning library that can perform machine learning and can be used to build models. Once a model is built it can then be persisted as PMML. Syncfusion offers a PMML execution engine that can be used within your .NET applications. For other platforms alternate PMML engines are available.

Please refer following UG documentation link for more detail about PIG, HIVE, OOZIE, and SQOOP with our Syncfusion Big Data Platform.
http://helpbdp.syncfusion.com/bigdata/overview

We will be happy to provide a custom demo or assist you with the work to be performed.

Please let us know if you have any further queries on this. We look forward to working with you.
 
Best Regards,
Daniel


CONFIRMATION

This post will be permanently deleted. Are you sure you want to continue?

Sorry, An error occured while processing your request. Please try again later.

You are using an outdated version of Internet Explorer that may not display all features of this and other websites. Upgrade to Internet Explorer 8 or newer for a better experience.

;