Tuesday, 11 April 2017

Infinispan Archetype 1.0.19 Released

I'm pleased to announce that we have just released version 1.0.19 of the infinispan-archetype. This release focuses on making the archetype compatible with Infinispan 9.0 as well as adding a store archetype for creating custom cache writer/loader implementations.

Archetype Usage


To utilise the archetypes use the following commands:


Contributing


If you encounter any issues with the archetypes, or would like to request additional archetypes, please raise an issue on GitHub.

Friday, 7 April 2017

In Memory Data Grid Patterns Demos from Devoxx France!

Devoxx France 2017 was a blast!! Emmanuel and I would like to thank all attendees to our in-memory data grids patterns talk. The room was full and we thoroughly enjoyed the experience!

During the talk we presented a couple of small demos that showcased some in-memory data grid use cases. The demos are located here, but I thought it'd be useful to provide some step-by-step here so that you can get them running as quickly as possible.

Before we start with any of the demos, it's necessary to run some set up steps:

  1. Check out git repository:

    git clone https://github.com/galderz/datagrid-patterns

  2. Download Infinispan Server 9.0.0.Final and at the same level as the git repository.

  3. Go into the datagrid-patterns directory, start the servers and wait until they've started:

    cd datagrid-patterns
    ./run-servers.sh

  4. Install Anaconda for Python 3, this is required to run Jupyter notebook for plotting.

  5. Install Maven 3.

Once the set up is complete, it's time to start with the individual demos.

Both demos shown below work with the same application domain: rail transport systems. In this domain, we differentiate between physical stations, trains, station boards which are located in stations, and finally stops, which are individual entries in station boards.

Analytics Demo


The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

To answer this question, Infinispan data grid will be loaded with 3 weeks worth of data from station boards. Once the data is loaded, we will execute a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

Then, these results are sent back to the client, which in turn, stores the results as JSON in an intermediate cache. Once the results are in place, we will use a Jupyter notebook to read those results and plot the result.

Let's see these steps in action:

  1. First, we need to install the server tasks in the running servers above:

    cd datagrid-patterns/analytics
    mvn clean install package -am -pl analytics-server
    mvn wildfly:deploy -pl analytics-server
    
  2. Open the datagrid-pattern repo with your favourite IDE and run delays.java.stream.InjectApp class located in analytics/analytics-server project. This command will inject the data into the cache. On my environment, it takes between 1 and 2 minutes.

  3. With the data loaded, we need to run the remote task that will calculate the total number of trains per hour and how many of those are delayed. To do that, execute delays.java.stream.AnalyticsApp class located in analytics/analytics-server project from your IDE.

  4. You can verify that the results have been calculating by going to the following address:


  5. With the results in place, it's time to start the Jupyter notebook:

    cd datagrid-patterns/analytics/analytics-jupyter
    ~/anaconda/bin/jupyter notebook

  6. Once the notebook opens, click open live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:


So, the answer to the question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

is 2am! That's because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Real Time Demo


The second demo that we presented uses the same application domain as above, but this time we're trying to use our data grid as a way of storing the station board state of each station at a given point in time. So, the idea is to use Infinispan as an in memory data grids for working with real time data.

So, what can we do with this type of data? In our demo, we will create a centralised dashboard of delayed trains around the country. To do that, we will take advantage of Infinispan's Continuous Query functionality which allows us to find those station boards which contain stops that are delayed, and as new delayed trains appeared these will be pushed to our dashboard.

To run this demo, keep the same servers running as above and do the following:

1. Run delays.query.continuous.FxApp application located in real-time project inside the datagrid-patterns demo. This app will inject some live station board data and will launch a JavaFX dashboard that shows delayed trains as they appear. It should look something like this:



Conclusion

This has been a summary of the demos that we run in our talk at Devoxx France with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information of these demos. If there's anything unclear or any of the instructions above are not working, please let us know!

Thanks to Emmanuel Bernard for partnering with me for this Devoxx France talk and for the continuous feedback while developing the demos. Thanks as well to Tristan Tarrant for the input in the demos and many thanks to all Devoxx France attendees who attended our talk :)

A very special thanks to Alexandre Masselot whose "Swiss Transport in Real Time: Tribulations in the Big Data Stack" talk at Soft-Shake 2016 was the inspiration for these demos. @Alex, thanks a lot for sharing the demos and data with me and the rest of the community!!

In a just a few weeks I'll be at Great Indian Developer Summit presenting these demos and much more! Stay tuned :)

Cheers,
Galder

Tuesday, 4 April 2017

Infinispan coming to Devoxx France 2017

Infinispan will be present in Devoxx France from 5th to 7th April 2017. Emmanuel Bernard and myself will be speaking about in-memory data grid use cases with some cool demos around rail train transport (who doesn't love trains?).

So, if you're at Devoxx France, or considering going there, and want to find out more about in-memory data grids and Infinispan, make sure you come to our talk!!

Cheers,
Galder

Monday, 3 April 2017

Infinispan Spark connector 0.5 released!

The Infinispan Spark connector offers seamless integration between Apache Spark and Infinispan Servers.
Apart from supporting Infinispan 9.0.0.Final and Spark 2.1.0, this release brings many usability improvements, and support for another major Spark API.

Configuration changes


The connector no longer uses a java.util.Properties object to hold configuration, that's now duty of org.infinispan.spark.config.ConnectorConfiguration, type safe and both Java and Scala friendly:


 

Filtering by query String


The previous version introduced the possibility of filtering an InfinispanRDD by providing a Query instance, that required going through the QueryDSL which in turn required a properly configured remote cache.

It's now possible to simply use an Ickle query string:



Improved support for Protocol Buffers


Support for reading from a Cache with protobuf encoding was present in the previous connector version, but now it's possible to also write using protobuf encoding and also have protobuf schema registration automatically handled.

To see this in practice, consider an arbitrary non-Infinispan based RDD<Integer, Hotel> where Hotel is given by:


In order to write this RDD to Infinispan it's just a matter of doing:

Internally the connector will trigger the auto-generation of the .proto file and message marshallers related to the configured entity(ies) and will handle registration of schemas in the server prior to writing.



Splitter is now pluggable


The Splitter is the interface responsible to create one or more partitions from a Infinispan cache, being each partition related to one or more segments. The Infinispan Spark connector now can be created using a custom implementation of Splitter allowing for different data partitioning strategies during the job processing.


Goodbye Scala 2.10


Scala 2.10 support was removed, Scala 2.11 is currently the only supported version. Scala 2.12 support will follow https://issues.apache.org/jira/browse/SPARK-14220


 

Streams with initial state


It is possible to configure the InfinispanInputDStream with an extra boolean parameter to receive the current cache state as events.

 

Dataset support


The Infinispan Spark connector now ships with support for Spark's Dataset API, with support for pushing down predicates, similar to rdd.filterByQuery. The entry point of this API is the Spark session:


To create an Infinispan based Dataframe, the "infinispan" data source need to be used, along with the usual connector configuration:

From here it's possible to use the untyped API, for example:

or execute SQL queries by setting a view:

In both cases above, the predicates and the required columns will be converted to an Infinispan Ickle filter, thus filtering data at the source rather than at Spark processing phase.


For the full list of changes see the release notes. For more information about the connector, the official documentation is the place to go. Also check the twitter data processing sample and to report bugs or request new features use the project JIRA.



Friday, 31 March 2017

Infinispan 9

Infinispan 9 is the culmination of nearly a year of work. It is codenamed "Ruppaner" in honor of the city of Konstanz, where we designed many of the improvements we've made. Prost!

Performance


We decided it was time to revisit Infinispan's performance and scalability. So we went back to our internals design and we made a number of improvements. Infinispan 9.0 is faster than any previous release by quite a sizeable margin in a number of key aspects:

  • distributed writes, thanks to a new algorithm which reduces the number of RPCs required to write to the owners
  • distributed reads, which scale much better under load
  • replicated writes, also with better scalability under load
  • eviction, thanks to a new in-memory container
  • internal marshalling, which was completely rewritten

We will have a post dedicated to benchmarks detailing the difference against previous versions and in various scenarios.

Marshalling


We've made several improvements in the cluster and persistent storage marshalling layer which has resulted in increased performance and smaller payloads. Also, the new marshaller layer makes JBoss Marshalling an optional component, which is only used when no Infinispan Externalizers (or AdvancedExternalizers) are available for a given type, hence relying on standard JDK Serializable/Externalizable capabilities to be marshalled.

Remote Hot Rod Clients


We now ship alternate marshallers for remote clients based on Kryo and ProtoStuff.

Additionally, the Hot Rod protocol now supports streaming operations for dealing with large objects.

Off-Heap and data-container changes


An In-Memory Data Grid likes to eat through your memory (because you want it to be fast!), but in the world of the JVM that is not ideal: that huge chunk of data gives Garbage Collectors a hard time when the heap goes into double-digit gigabyte territory. Long GC pauses can make individual nodes unresponsive, compromising the stability of your cluster.

Infinispan 9 introduces an improved data container which can optionally store entries off-heap.

Additionally, our bounded container has been replaced with Ben Manes' excellent Caffeine which provides much better performance. Check out Ben's benchmarks where he compares, among other things, against Infinispan's old bounded container.

Configuration-wise, the previously separate concepts of eviction, store-as-binary and data-container have been merged into a single 'memory' configuration element.

Persistence


The JDBC cache store received quite an overhaul:

  • The internal connection pool is now based on HikariCP, for improved performance
  • Writes will now use database-specific upsert functionality when available
  • Transactional writes to the cache translate to transactional writes to the database
  • The JdbcBinaryStore and JdbcMixedStore have been removed as detailed here

We have also replaced the LevelDB cache store with the better-maintained and faster RocksDB cache store.

Ickle, our new query language


We decided it was time for Infinispan to have a proper query language, which would take full advantage of our query capabilities. We have therefore grafted Lucene's full-text operators on top of a subset of JP-QL to obtain Ickle. We have already started describing Ickle in a recent blog post. For a taste of Ickle, the following query shows how to combine a traditional boolean clause with a full-text term query:


select transactionId, amount, description from com.acme.Transaction
where amount > 10 and description : "coffee"

Cloud integrations


Infinispan continues to play nicely in cloud environments thanks to a number of improvements that have been made to discovery (such as KUBE_PING for Kubernetes/OpenShift), health probes and our pre-built Docker images.

Multi-tenant server and SNI support


Infinispan Server is now capable of exposing multiple cache containers through a single Hot Rod or REST endpoint. The selection of the container is performed via SNI. This allows you to have a single cluster serve all your applications while maintaining each one's data isolated.

Administration Console


The adminstration console has been completely rewritten in a more modular fashion using TypeScript to allow for greater extensibility and ease of maintanence. In addition to this refactor, the console now supports the following:

  • Stateless views
  • HTTP Digest Authentication
  • Management of individual and clustered Standalone server instances
  • Internet Explorer

Documentation overhaul


Our documentation has been completely overhauled with entire chapters being added or rewritten for readability and consistency.

What's coming


We will be blogging in more detail about some of the things above, so watch out for more content coming soon !


We've already started working on Infinispan 9.1 which will bring a number of new features and improvements, such as clustered counters, consistency checker with merge policies, a new distributed cache for even better write performance, and more.

Get it now !


Head over to our download page to get binaries, sources, clients, etc.

Please join us to let us know what you think about this release.


The Infinispan team

Wednesday, 22 March 2017

Infinispan 9.0.0.CR4

Dear Infinispan users, we thought CR3 was going to be the last candidate release before Final... but we were mistaken!
The reason for yet another CR is that we decided to make some changes which affect some default behaviours:
  • enabling optimistic transactions with repeatable read now turns on write-skew by default
  • retrieving an already configured cache by passing in a template doesn't redefine that cache's configuration
Other important changes:
  • big improvements to the client/server rolling upgrade process
  • allow indexes to be stored in off-heap caches
  • lots of bug fixes
For the full list of changes check the release notes, download the 9.0.0.CR4 release and let us know if you have any questions or suggestions.

Cheers,
The Infinispan team

Tuesday, 21 March 2017

Docker image security changes

In the latest 9.0.0.CR3 version, the Infinispan REST endpoint is secured by default, and in order to facilitate remote access, the Docker image has some changes related to the security.

The image now creates a default user login upon start; this user can be changed via environment variables if desired:

You can check if the settings are in place by manipulating data via REST. Trying to do a curl without credentials should lead to a 401 response:

So make sure to always include the credentials from now on when interacting with the Rest endpoint! If using curl, this is the syntax:

And that's all for this post. To find out more about the Infinispan Docker image, check the documentation, give it a try and let us know if you have any issues or suggestions!