left-icon

Groovy Succinctly®
by Duncan Dickinson

Previous
Chapter

of
A
A
A

CHAPTER 4

Data Streams

Data Streams


Introducing streams

In Chapter 3, we used the findAll and collect methods to filter and transform data. These methods are handy, but keep in mind that they return new data structures (such as a list or a map) from each call. As we chain them through a series of methods (i.e. a pipeline) each returned value uses an amount of memory that (eventually) gets wiped while the next method grabs another chunk of memory. It's all a bit "chunky."

When Java 8 was released, it included two new features—lambdas and streams. Groovy developers already had closures, so lambdas weren't particularly exciting. In fact, Groovy closures can simply be dropped into spots where lambdas are expected and the closure syntax is nicer (to my eye). At first glance, streams look to perform the same role as many of the methods that Groovy added to lists and maps. However, some very useful differences make streams worth investigating.

Tip: I tend to use the built-in Groovy methods such as each and findAll for smaller data or simple operations, but I use streams when it gets more complex.

Let's start by looking at how we work with streams. First of all, a source, which can be finite or infinite, is needed. For the examples we have been looking at, the source would be a finite list of items, but, because we're looking at weather recordings, the data could just as well be infinite. To keep things simple, we'll read in the finite amount of data from the previously utilized CSV file of weather data.

Streams will let us analyze data in a manner looks a lot like SQL. We'll be able to filter the data, perform aggregations (e.g., sums and averages), map the data to different structures, and group and sort. This is all performed by using a sequence of method calls that adhere to the following process:

  1. Create the stream.
  2. Perform zero or more intermediate operations.
  3. Perform one terminal operation.

The elements in this process are referred to as a pipeline, and it's easy to see why. Picture the data starting in a pool or lake. The data then flows through a number of operations and terminates in a final output. I imagine something like the Nile or the Amazon starting from a source, flowing into different rivulets, and finally arriving at a river mouth. (I'm not overly poetic, so find an analogy of streams and pipes that best suits you!)

Our weather data is held in a list after being read from the CSV, which means creation of a stream is easy—you simply call the stream() method on the list.

Intermediate operations, such as filter and map, return a stream and allow for further intermediate operations. Think of these operations as applying a lens over the original data—subsequent operations will see the data through this lens. However, these operations are different from the built-in Groovy methods such as findAll because the intermediate operation doesn't return a new data value (e.g., a list of maps).

Intermediate operations are said to be lazy because they aren't actually processed until the stream reaches a terminal operation. Terminal operations mark the end of a stream and are generally used to reduce the data through some sort of calculation or to collect the data in some manner. Once a terminal operation has been called, you'll be unable to access that stream again, and you will need to create a new strream.

The original data source (e.g., the list) is not changed by the stream, and this is a desirable characteristic. While some of the stream methods open up the ability to modify the underlying data, doing this is not recommended—it will reduce your ability to achieve other efficiencies.

As we begin exploring streams, I'll describe the examples that follow by posing a question for the data, describing a method for answering it, and providing the output. I'll build up the use of streams over these examples so we can focus on a specific operation at each stage.

The first question is "Which records exist for February 2008?" Code Listing 51: Filtering a Stream treads some familiar ground in reading a CSV file, so the second half of the code listing is what interests us. In order to create a stream, we simply call weatherData.stream(), then we can provide the rest of the pipeline as a chain of methods calls.

In order to extract the required records, let’s use the filter method twice—once for the year and again for the month. The filtering can be performed in a single call, but I find that breaking each filter item into its own call is more readable for a set of filters that are performing a boolean AND (where all elements must be true). I suggest using a single filter call for performing boolean OR. The filter method is passed a closure that defines the predicate that an entry must match to meet the filter. This is much the same as we saw in the findAll method.

The terminal operation for the stream pipeline is collect(). This call causes the filter methods to be processed and the resulting subset to be coalesced. The mappedData variable then holds a list of (CSVRecord) items that match the filter predicates. If collect wasn't called, the mappedData variable would hold a stream that would be waiting for its terminal operation.

Code Listing 52: Partial Output from the Filtered Stream displays the first few JSON-formatted list items from the mappedData variable. These are generated as a list of lists by the JsonOutput library, but they also could have been easily handled in the approaches seen in the Chapter 3  Solution Fundamentals section. I'll improve on the output in the next example.

Code Listing 51: Filtering a Stream

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
    .parse(getFile('../data/weather_data.csv').newReader())
    .getRecords()

def mappedData = weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .collect()

print prettyPrint(toJson(mappedData))

Code Listing 52: Partial Output from the Filtered Stream

[

    [

        "AU_QLD_098",

        "2008",

        "02",

        "01",

        "20",

        "5",

        "345.4"

    ],

    [

        "AU_QLD_098",

        "2008",

        "02",

        "02",

        "18",

        "8",

        "206.0"

    ],

The next question is "What was the daily rainfall for February 2008?" As you can see in Code Listing 53: Mapping a Stream, I create the stream and perform the filtering in the same manner as earlier listings. I then call the map intermediate operation and pass it a closure to perform the mapping. This closure is called for each item in the stream that meets the filter predicates. Within the closure I set up a map with two keys:

  • date holds a formatted string using the ISO 8601 format of YYYY-MM-DD.
  • rainfall holds the value held in the 'Rainfall (millimeters)' field in the CSV, cast as a BigDecimal.

The mapping simply prepares a data structure that suits my need, nothing particularly complex. Another mapping could perform calculations to transform the data if needed, perhaps changing millimeters to inches in order to meet another system's requirements.

The call to collect results in mappedData holding a list of the maps created by the map operation. This is converted into JSON, as seen in Code Listing 54: Partial Output from the Mapped Stream, and it is more useful that the previous JSON output.

Code Listing 53: Mapping a Stream

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
    .parse(getFile('../data/weather_data.csv').newReader())
    .getRecords()

def mappedData = weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .map {
        [date: "${it.Year}-${it.Month}-${it.Day}",
         rainfall: it.'Rainfall (millimetres)'.toBigDecimal()]
    }.collect()

print prettyPrint(toJson(mappedData))

Code Listing 54: Partial Output from the Mapped Stream

[

    {

        "date": "2008-02-01",

        "rainfall": 345.4

    },

    {

        "date": "2008-02-02",

        "rainfall": 206.0

    },

Carrying over from the previous example, we can use the collect method to perform a mapping operation as the terminating operation. Taking a look at Code Listing 53: Mapping a Stream, you'll see that I've taken out the call to the map operation and called the collect method with an argument to the Collectors.toMap static method. Remember that Groovy allows us to skip use of parentheses when calling a method, and I use this option with the collect call's arguments but use parentheses for the toMap call. This groups the arguments cleanly and avoids excess syntax.

The toMap method accumulates each stream item into a map item. The call is passed two closures—the first generates the key for the map; the second returns the value for the map. As you'll see in Code Listing 56: Partial Output from the Collected Stream, we now have a very compact answer to our question.

Code Listing 55: Collecting a Stream

import static java.nio.file.Paths.get as getFile
@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.util.stream.Collectors.toMap

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def mappedData = weatherData.stream()
        .filter { it.Year == '2008' }
        .filter { it.Month == '02' }
        .collect toMap({"${it.Year}-${it.Month}-${it.Day}"},
                       { it.'Rainfall (millimetres)'.toBigDecimal() })

print prettyPrint(toJson(mappedData))

Code Listing 56: Partial Output from the Collected Stream

{

    "2008-02-21": 142.1,

    "2008-02-22": 53.5,

    "2008-02-23": 29.2,

    "2008-02-01": 345.4,

    "2008-02-24": 327.5,

Reducing

So far I have filtered a stream and mapped the data in a basic manner. I haven't performed any calculations to summarize the data, and this is where reduction comes in. You may have heard of the MapReduce model before—essentially, it consists of a process through which data is filtered, sorted, and transformed (the Map part), then summarized (the Reduce part). A summarizing operation might determine the average (mean), maximum, minimum, sum, or any other calculation that is performed across the data to return a value of interest.

Streams provide a variety of terminal operations that perform the Reduce component of MapReduce. What's more, a number of handy methods are provided by streams that save us from writing our own Reduce functions.

In addition to specific methods (such as average()), the collect terminal operation can be passed a Collector that performs some sort of reduction. In the previous section, I used the toMap method provided in the Collectors class, and this is a great place to check before writing your own Collector.

Let's prepare another question to answer: "What was the average rainfall for February 2008?"

Code Listing 58: Calculating an Average creates the stream then filters it in the same manner we saw in the previous examples. Once this has been laid out, we next call mapToDouble because this method lets us extract the values for the 'Rainfall (millimetres)' field (converted to a Double). We do this as the mapToDouble method returns a specialized stream (DoubleStream) that provides us with an easy way to determine the average of the filtered items.

So, while the original stream was for the list of records from the CSV, we use mapToDouble to slice out a stream based on a specified field/column. We can then use this stream to perform an aggregate operation—in this case it's the average() calculation. At this point, it would be reasonable to think we'd have a single value, the average rainfall for February 2008. However, we actually have an OptionalDouble, and this needs some explaining.

In programming, the concept of null represents the absence of a value—this means that null itself isn't a value. Null causes a lot of frustration for developers because it either creates exceptions or has to be handled so as to avoid these exceptions. In the Java and Groovy world, an attempt to call a variable that is null (i.e. it holds no value) causes a NullPointerException and no small amount of annoyance.

In Code Listing 57: Damn You Null!, I have written some pretty stupid code that sets up a new map variable, assigns it to null, then tries to access an element that is no longer there. This is something you should never see in the wild, but imagine that it's a much larger application in which my details variable is assigned null by some other part of the code. Perhaps this assignment was an act of tidying up that I'm unaware of. Regardless, my code now throws a NullPointerException and comes to a screaming halt.

Code Listing 57: Damn You Null!

def details = [name: 'Fred']

details = null

println details.name

Groovy lets me handle this more elegantly through the use of the safe navigation operator (?.), and I could have used println details?.name to avoid the exception. Groovy also interprets null to be false, which means I can use the ternary operator: println details ? details.name :'Null'.

The Java 8 release included the Optional class and its siblings, including DoubleOptional. Optional objects may or may not contain a value. This is a handy abstraction because it provides a wrapping that offers another approach to avoiding NullPointerExceptions. We can check that an optional contains a value by calling its isPresent method. However, we can simply use the optional's orElse method and provide a default value as an argument.

In Code Listing 58: Calculating an Average, the call average() returns a DoubleOptional, so we then call its orElse method to access the value. If the call to average happens to result in null, the call to orElse will result in zero (0) being returned and a NullPointerException avoided.

Code Listing 58: Calculating an Average

import static java.nio.file.Paths.get as getFile
@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
    .parse(getFile('../data/weather_data.csv').newReader())
    .getRecords()

println weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .mapToDouble { record ->
        record.'Rainfall (millimetres)'.toDouble()
    }
    .average()
    .orElse(0)

If I was to ask the question "What was the total rainfall for February 2008?" you'd be excused for thinking that you could simply swap out the call to average() with one to sum(), but, annoyingly, this isn't quite right. Code Listing 59: Calculating a Sum, illustrates that the sum() method doesn't return an optional but instead gives us a direct value. It's always useful to check with the API documentation to see if you'll get a value or an optional returned.

Code Listing 59: Calculating a Sum

import static java.nio.file.Paths.get as getFile
@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
    .parse(getFile('../data/weather_data.csv').newReader())
    .getRecords()

println weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .mapToDouble { record ->
        record.'Rainfall (millimetres)'.toDouble()
    }
    .sum()

Tip: These examples are all based on "clean" data and assume that numeric fields in the CSV contain numbers. If your data is messy, it's worth using the map operation to tidy it up.

While we can use operations such as sum, average, max, min, and count to summarize a stream, we can also perform reduce operations in the collect call. This will let us answer the question "What were the rainfall stats for February 2008?"

Code Listing 60: Summarizing Data demonstrates just how easy streams make it to prepare summary statistics for a set of values. In this case, I have summarized the rainfall readings within the collect operation. In order to perform the aggregation, we pass the collect method a call the to Collectors.summarizingDouble. This latter method accepts a closure that returns a Double value for each item in the stream. In this case, we simply cast the rainfall reading to a Double value. The resulting output of the stream is then displayed in JSON format in Code Listing 61: Output of Summarizing Data. 

Code Listing 60: Summarizing Data

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile
import static java.util.stream.Collectors.summarizingDouble

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def summaryData = weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .collect summarizingDouble{it.'Rainfall (millimetres)'.toDouble()}

print prettyPrint(toJson(summaryData))

Code Listing 61: Output of Summarizing Data

{

    "sum": 6283.5,

    "average": 216.67241379310346,

    "max": 380.6,

    "min": 29.2,

    "count": 29

}

Grouping

Breaking down the data into groups, especially for summary/aggregate operations, is often useful. We can achieve this in the collect operation through calls to the Collectors.groupingBy methods.

Let's try out grouping by posing the question "What were the monthly rainfall statistics for 2008?" The answer will look similar to Code Listing 60: Summarizing Data, but that only gave us a summary for a single month. Essentially, that solution summarized the filtered data, and the filter applied to a specific month and year combination. In order to solve this question, we'll have to filter for just the year, then calculate the summaries for each month. Luckily, we can do this with a stream.

Code Listing 63: Summarizing by Month sets up the stream and filters for items from the year 2008. We then use the map operation to set up a map for each item. This map contains two keys:

  • date—constructed using a helpful closure (getDate) I set up.
  • Rainfall—uses a BigDecimal representation of the reading.

We now have a "data set" for 2008 represented as a series of date and rainfall fields. In order to answer the question, we'll need to group this data by month and, for each month, we must prepare the summary data. This is performed in the collect operation by calling the form of Collectors.groupingBy, which takes two arguments:

  • The classifier for the group provided as a closure. This is straightforward—we just grab the month component from the date field.
  • A collector that is used to perform a reduction. For this, we run summarizingDouble over the rainfall data.

Code Listing 62: Partial Output for Summary by Month provides a portion of the JSON displayed by Code Listing 63: Summarizing by Month. The JSON is a series of map items, each one representing a month from 2008 and containing a map of summary data.

Code Listing 62: Partial Output for Summary by Month

{

    "APRIL": {

        "sum": 6389.8,

        "average": 212.99333333333334,

        "max": 393.7,

        "min": 5.3,

        "count": 30

    },

    "SEPTEMBER": {

        "sum": 6498.5,

        "average": 216.61666666666667,

        "max": 395.2,

        "min": 11.1,

        "count": 30

    },

Code Listing 63: Summary by Month

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile
import static java.util.stream.Collectors.groupingBy
import static java.util.stream.Collectors.summarizingDouble

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

import java.time.LocalDate

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def getDate = { y, m, d ->
    LocalDate.of(y.toInteger(),
            m.toInteger(),
            d.toInteger())
}

def monthlyData = weatherData.stream()
    .filter { it.Year == '2008' }
    .map {
        [date: getDate(it.Year, it.Month, it.Day),
         rainfall: it.'Rainfall (millimetres)'.toBigDecimal()]
    }.collect groupingBy({ it.date.month },
                         summarizingDouble{it.rainfall.doubleValue()})

print prettyPrint(toJson(monthlyData))

In Code Listing 63: Summarizing by Month, I prepared a LocalDate object for the date field, but this wasn't strictly needed because I could have used the two-digit month value for the date. I went that extra bit further so that my resulting data (and JSON) would have a real date value that could be more easily utilized by other code.

The groupingBy method allows us to create subgroups in a cascading fashion. This allows us to group the data to the level we need, then perform aggregation.

Subgroups will help us answer the question "What was the daily rainfall record as grouped by year then month?" This isn't asking us for a calculation, just a data structure in which the daily rainfall recordings are grouped by month, which are themselves grouped by year.

Code Listing 64: Grouping by Year then Month is somewhat like our previous example in that the groupingBy method is called within the collect operation. The first call to groupingBy is based on the year (the first argument), then it calls another groupingBy (the second argument). This second groupingBy works within the context of the grouped year (e.g., 2006), so that there's no crossover from data belonging to the same month in another year. It's at the second grouping (month) that we then call toMap to create a map in a manner similar to the one seen in Code Listing 55: Collecting a Stream.

Code Listing 64: Grouping by Year then Month

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile
import static java.util.stream.Collectors.groupingBy
import static java.util.stream.Collectors.toMap

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

import java.time.LocalDate

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def getDate = { y, m, d ->
    LocalDate.of(y.toInteger(),
            m.toInteger(),
            d.toInteger())
}

def monthlyData = weatherData.stream()
    .map {
        [date: getDate(it.Year, it.Month, it.Day),
         rainfall: it.'Rainfall (millimetres)'.toBigDecimal()]
    }.collect groupingBy({ it.date.year },
                groupingBy({ it.date.month },
                    toMap({ it.date.day },{ it.rainfall })))

print prettyPrint(toJson(monthlyData))

As you can see in Code Listing 65: Partial Output of Grouping by Year then Month, the resulting JSON is a series of year objects containing a series of month objects, each containing a map of rainfall readings for each day in the month.

Code Listing 65: Partial Output of Grouping by Year then Month

{

    "2006": {

        "DECEMBER": {

            "1": 237.2,

            "2": 188.2,

            "3": 359.6,

            "4": 231.1,

            "5": 93.2,

            "6": 337.0,

            "7": 234.2,

            "8": 246.8,

            "9": 184.0,

            "10": 332.6,

The intent of our previous question was to restructure the data held in the CSV file, but it's more likely we'd ask "What was the average rainfall by month, for each year?" The solution code is similar to the previous code, but we call on averagingDouble instead of toMap within the subgroup.

Code Listing 66: Average by Year-Month

import static groovy.json.JsonOutput.prettyPrint
import static groovy.json.JsonOutput.toJson
import static java.nio.file.Paths.get as getFile
import static java.util.stream.Collectors.averagingDouble
import static java.util.stream.Collectors.groupingBy

@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def monthlyAverages = weatherData.stream()
    .collect groupingBy({ "$it.Year" },
                groupingBy({ "$it.Month" },
                    averagingDouble {
                        it.'Rainfall (millimetres)'.toDouble()
                    }))

print prettyPrint(toJson(monthlyAverages))

In Code Listing 66: Average by Year-Month, you'll see that I skipped my conversion to a LocalDate and instead used the raw Year and Month values from the CSV. I'd prefer to use the LocalDate approach, but I wanted to demonstrate an alternative. Within the inner grouping (by month), I call the averagingDouble method from the Collectors class. This accepts a closure that I use to provide a double value for each record, then calculates the average across the values in the group.

This will give me a useful output structure of year objects, each containing their constituent months and the average rainfall for each month. You can see the first part of the output in Code Listing 67: Partial Output for Average by Year-Month.

Code Listing 67: Partial Output for Average by Year-Month

{

    "2015": {

        "10": 198.66774193548386,

        "11": 183.61666666666667,

        "12": 240.41935483870967,

        "01": 161.56129032258065,

Sorting

While reading the previous code listing, you might have noticed that the months weren't in order. This may not be an issue if the recipient code doesn't need the data to be sorted. Furthermore, by not worrying about ordering, the stream operations have greater flexibility with processing. Different chunks of the pipeline can be sent to other threads or even to processors, and it won’t matter when they're returned. You often may not care about sorting/order, but this isn't the case with everyone.

The sorted method in the Stream class provides an intermediate operation that, you guessed it, returns a stream consisting of sorted elements. The first variant of sorted uses the natural order of the stream elements. This is great if you have a numeric stream such as the DoubleStream we saw earlier. The second variant lets us provide our own comparator, which will be handy for our next question: "Could I have the daily rainfall figures for February 2008 presented in descending order?"

Code Listing 68: Sorting a Stream filters for the appropriate records, then calls on sorted. If I had only a stream of the rainfall recordings, I could have simply relied on sorted without a comparator, but I would lose the day on which the recording was made, and I want this in my answer. This means that I must provide my own comparator and that this is a closure that accepts two arguments: the first and second objects for comparison. As you can see, the closure I've provided calls another closure (sortDescending), which uses the spaceship operator (mentioned in Operators) to provide the comparison.

Instead of terminating the stream with a collect operation, I used forEach to display the results. Having seen Groovy's each method, you won't be overly surprised with the sample output seen in Code Listing 69: Partial Output for Sorting a Stream.

Code Listing 68: Sorting a Stream

import static java.nio.file.Paths.get as getFile
@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*

def weatherData = RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()

def sortDescending = { n1, n2 ->
    n1.toBigDecimal() <=> n2.toBigDecimal()
}

weatherData.stream()
    .filter { it.Year == '2008' }
    .filter { it.Month == '02' }
    .sorted { day1, day2 ->
        sortDescending day2.'Rainfall (millimetres)',
                       day1.'Rainfall (millimetres)' }
    .forEach {
        println "$it.Day: ${it.'Rainfall (millimetres)'}"
    }

Naturally, we could use forEach to perform more serious actions—perhaps shooting the data out to a web service.

Code Listing 69: Partial Output for Sorting a Stream

14: 380.6

13: 373.0

06: 371.5

08: 370.8

07: 362.5

01: 345.4

24: 327.5

Conclusion

In this chapter we've filtered, group sorted, and reduced the weather data. Those of you with an SQL background are probably seeing aspects of the select, where, and order by statements with aspects of aggregate functions. I tend to see streams from this perspective—they give me a domain-specific language for working with a data series.

Although I must load the data from a file, I use an existing library (Apache Commons CSV) to do the heavy lifting. This is similar to SQL's from statement, but the CSV library is less abstract, so I still need to do a little prep. Because Java streams continue to be folded into general use, we may see libraries further simplify this aspect. For now, though, it's easy enough.

In this chapter’s examples, I have performed the CSV read as one unit of work and the stream operations as another. This means that I have needed an interim variable (such as weatherData or monthlyData) to hold the parsed CSV data. My goal was to improve readability and comprehension for learning, but the whole process can be performed in a single chain of method calls, as seen in Code Listing 70: A Single-Method Chain for a CSV File and a Stream. Ultimately, you'll make a call on what's more readable, but the chain really captures the process of reading in data, filtering, mapping, and reducing (summarizing).

Code Listing 70: A Single-Method Chain for a CSV File and a Stream

import static java.nio.file.Paths.get as getFile
@Grab('org.apache.commons:commons-csv:1.2')
import static org.apache.commons.csv.CSVFormat.*
import static java.util.stream.Collectors.summarizingDouble

def displayJson = { data ->
    print prettyPrint(toJson(data))
}

displayJson RFC4180.withHeader()
        .parse(getFile('../data/weather_data.csv').newReader())
        .getRecords()
        .stream()
        .filter { it.Year == '2008' }
        .filter { it.Month == '02' }
        .collect(summarizingDouble { it.'Rainfall (millimetres)'.toDouble() })

There's a lot more to Java streams than I can cover here, and the following resources will help build your understanding:

  • Processing Data with Java SE 8 Streams is a two-part article by Raoul-Gabriel Urma from the Java Magazine that provides a good tutorial:
  • Part 1 from the March/April 2014 issue.
  • Part 2 from the May/June 2014 issue.
Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.