Pipeline processing of data – python syntax
In python world, python co-routines/generators are used to perform pipeline processing of finite/infinite data from any data source (like socket/collection/generators/..), as shown below,
Example 1:
grep()
is used with parallelism.
Example 2:
wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes)
In the above examples, data is processed as needed.
Java – streams
Brian goetz says, Streams is about possibly-parallel, aggregate operations on data sets, with an example,
txns .stream() .filter(t -> t.getBuyer().getAge() >= 65 ) .map(Txn::getSeller) .distinct() .sorted(comparing(Seller::getName)) .forEach(s -> System.out.println(s.getName()))
Documentation says, To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)
), and a terminal operation (which produces a result or side-effect, such as count()
or forEach(Consumer)
). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.
My understanding is,
ThreadPoolExecutor
abstraction in java 5, provides structuring of concurrent applications, that involves shared mutable state.
forkjoinpool in java 7 provides parallelism of CPU bound operations, that does not involve shared mutable state.
Is Java stream mainly used to perform pipeline processing of data? that does not involve shared mutable state.