- Processes, standards and quality
This year’s JDD conference was held in Galaxy hotel in Kraków. It was split into two days on 14 and 15 of October. The lectures were given in three parallel sessions with workshops on the second day. Below is a summary of the most interesting presentations.
Daniel Kostrzewa’s review
Bartek Zdanowski – Vert.x
During the presentation Bartek (working in TouK) presented scalable, distributed, multilingual, light application platform – Vert.x. (http://vertx.io/). The platform (currently owned by the Eclipse Foundation) is based on the EventBus.
According to the speaker, Vert.x is a good framework for creating multithreaded programs and it makes writing applications extremely easy. An interesting approach was used, which assumes that exactly one thread is constantly running on one core processor. Thanks to that, the application is written as if it was single-threaded. The effect of multithreading is achieved without using: the synchronized blocks, locks, or variables marked as volatile. The platform has a lock-free I/O, which enables to manage a huge number of connections with the low use of the thread pool.
Vert.x also provides the event rail, which enables communication between modules running on this platform. It is worth noticing that modules can be written in various languages and Vert.x allows connecting them into working solution.
Parts of the code which are run by Vert.x are called ‘verticle’. These modules work inside the Vert.x. For instance, whereas a single Vert.x instance works on its own Java virtual machine. The Vert.x platform allows converting many modules at the same time. A few Vert.x instances can run on one or more (communicating via web) computational machines. Those instances can be configured in a way, that they all make a cluster and communicate with each other by aforementioned event bus.
Bartek has also presented performance tests, comparing Vert.x and Node.js. Unfortunately, these tests were not carried out by the author of this presentation, but downloaded from the Vert.x project blog (http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/).
At the end of his presentation, Bartek showed, on some simple examples, how to write multithreaded applications using the Vert.x platform.
It was a great pleasure to participate in the presentation and the topic made the audience willing to implement this solution in their projects.
Marcin Zajączkowski – The mutation testing – how good your tests really are?
In his presentation, Marcin has raised a very interesting testing method – mutation testing. A well-known code test coverage (measured with different popular tools) verifies only if some test ‘passed’ through a given line. Unfortunately, such approach is not always enough, because it does not inform us about the quality of written tests.
According to the author of the presentation, mutation testing can be a solution to this problem. Such technique is about changing the chosen line of the application code (introducing the mutation), and checking if any automatic test will detect this change. Mutations that survived (were not detected by any tests) are the places, where any potential application bug will stay undetected.
The speaker also highlighted the disadvantages of presented method, which are: a long period of work, the necessity of modification of the production code, the possibility of occurrence of infinite loops, or overloading the stack. Nevertheless, the biggest problem is that there are not many tools in Java language, which could support the presented strategy of testing (many of them are no longer maintained).
In the next part, Marcin presented one of the best available, and constantly developed, tools to support the mutation testing – PIT (http://pitest.org/).
In Marcin’s opinion, mutation testing should be used in particular while creating systems, key features of which should be high quality and reliability.
Despite of the fact that presented technique is known in the scientific environment since 30 years, it seems like it has been discovered again and implemented in some huge commercial solutions.
During the presentation, you could have been under the impression that the mutation testing is an immature method. However, the strategy is so innovative and interesting that it is worthwhile to follow its development.
Mateusz Chrobok’s review
Jarosław Pałka – You can have your data in cache too
Jarosław Pałka’s lecture was one of the first during this year’s JDD. The biggest conference room was bursting at the seams, which was a proof of huge interest in the subject of the lecture. A brief introduction included information about to whom it was dedicated, what is cache mechanism, and where you can come across it. The presentation was divided into three parts.
The first part was introduction to cache terminology:
– cache eviction – the size of cache has been exceeded, so there is a need to delete some data; the most popular algorithms to estimate such elements are LRU, LFU, FIFO and LIRS,
– cache invalidation – data in cache is invalid and there is a need to inform the rest of system about that; the notion is significant in case of distributed topologies; expensive and difficult to implement, therefore rarely implemented,
– cache expiry – time given to the element in cache, after which it will be deleted; usually defined by two parameters: idle time and TTL (time to live),
– hit/miss ratio – an indicator of cache’s efficiency; well-designed mechanism can provides a ratio on the level of 80%,
– cache passivation (cache offloading) – the size of cache has been exceeded, which results in saving deleted data on the disk; the data is still in cache but the time of access is extended,
– topology – the following types of cache can be distinguished: local cache, remote, distributed, and hybrid (fusion of local cache and remote/distributed).
The second part covered four available solutions. The order in which they were discussed was not random. Firstly, the fastest but poor mechanisms were presented. Each following was inferior to the previous one in respect of speed but surpassed it in respect of functionality:
– memcached – the fastest solution, having only basic functionality; division of assigned memory to slabs, pages and chunks; lack of persistence layer; memcached servers are independent from each other; client’s application uses the consistent hashing algorithm to distribute data to many memcached instances,
– redis – slower than memcached, allows to set the expiry time, supports replication and has two options of persistence – append only file and snapshotting,
– ehcache – similar to redis in respect of efficiency and functionality, supports replication, distribution, simple transactional functionality, and cache offloading,
– infinispan – the most powerful and also the slowest one amongst discussed solutions, supports full transactional functionality, replication, cache offloading, consistent hashing in communication between nodes, has a persistence layer, Tree API, Query API, Group API.
The last covered issue focused on places in standard web applications, in which cache mechanisms can be used. Following levels were presented:
– database level – the solutions applied to Hibernate were presented, that is: cache on the level of session, second-level caching, and queries cache; attention was drawn to the problem of keeping SQL queries results in memcached, because of too long keys,
– backend level – it is necessary to avoid checking in code if a given element already exists in cache and putting it in, if it was not found; use @Cacheable (Spring, JSR 107) instead,
– frontend level – some servers like Apache, Varnish, Nginx or Squid were mentioned; using the conditional GET to manage cache on client’s side;
Nevertheless, we have to remember that putting too much cache mechanisms into the application may be dangerous. The control over processed data can be lost because of that.
The lecture was rather for those, who do not have extensive knowledge on cache mechanism. The presentation was clear and easy to understand, and the lecture itself was very interesting. Only the ending may leave you yearning for more, because the lecturer had a problem with a question related to one of the slides. You might have the impression that he was not able to justify his answer. Nevertheless, it did not have a negative impact on the whole presentation.
Adrian Wilczek’s review
Jaroslav Tulach – Introduction to bck2brwsr project, 20API Paradoxes
API project gives us different possibilities, like drawing with Canvas API, managing the sound, geolocation, or WebSockets. It is worth to mention also the progressing works on compilation of the Java sources code on the browser’s side.
Does this young project have a future? Unfortunately, we are only able to run specially adjusted code or Java libraries in this way – each not supported operation will rise a security exception. That is one of the assumptions of the project. What is more, because of the JS’s character we have to forget about a multithreading. It was also noticable, that startup of the demonstrative application demanded some time. Nevertheless, the project is extremely interesting and probably it will find its supporters and some specific use cases.
Another lecture started with the explanation of philosophical sense of the paradoxes presented in the book. Jaroslav compared them to the seeming cognitive paradoxes, found as a result of scientific experiments, the results of which do not fit to the current model of phenomenon description. Here, Michelson-Morley’s experiment was quoted, which in the 80’s (XIX c.) has showed constant speed of light in every frame of reference, which unexpectedly overthrown the theory about the existence of a hypothetical ‘ether’, or even eventually led to Einstein’s great discovery. The less we know about the topic we are investigating, the more paradoxes we find. Following that way of thinking, if we compare the programming of API available to other programmers to writing a regular code, we can get ‘paradoxically’ different pieces of advice, or instructions regarding a good design.
After such introduction, only some title topics, from the twenty presented in the book, were briefly covered. Among others, the one that the more freedom in changing the system is given to the users by our API, the more freedom in changing the API in other versions we take away from ourselves. Or more obvious one – that we have to be very careful while selecting the visibility methods of classes that are made available, so that API users do not have the possibility to use them in a way we do not want them to be used. The lecturer also claimed that a good technology (e.g. API) consists of three factors. First – it has to be ‘cool’, to attract people’s attention. Second – it has to be efficient in use, it would be best if people could use it not knowing much about it. Work on IDE has also taught him that it is equally important, that API was compatible backward, in the course of its evolution.
The co-creator of NetBeans has advertised his book as interspersed with philosophy, paraphrasing the commentary from one of the reviewers: ‘if you like philosophy you will be pissed, if you don’t – you will be bored’, adding jokingly that it has got also many more favourable reviews ;).
Łukasz Kulig’s review
Nathan Marz – The Epistemology of Software Engineering
One of the presentations opening JDD, was the statement of Nathan Marz, a founder of Storm – a tool for event processing in the distributed system. At the very beginning, Nathan stressed that the presentation will be rather philosophical than technical.
After a short introduction, Nathan stated that there is only one thing he wanted to pass to the audience – ‘Your code is wrong’. You could have felt slight astonishment in the room. How could he assume that our code is wrong, if he had never seen it? We have precise examples of usage, our code fulfils all criteria defined by the client, we use design patterns, our code is covered by tests
and of course it is of a high quality, so what is the matter?
After a moment of silence, Nathan explained what he ment. He started with a question – ‘How can you be sure, that your code is good?’, and the answer was – you cannot be. The lecturer showed us the broader perspective – we cannot be sure that our code is good, because there are many different aspects that influence the work of the whole system we are working on.
He presented quite a short example – if the code written by us, retrieves the data from the outer world, how can we be sure that it is correct? It was made somewhere, how can we be sure that it is exactly what it should be? Many different aspects influence the code, aspects which we are not able to foresee – process A influences process B, and process B influences process C, and so on. Nathan started to go further and further – 'How can you be sure that your hardware works properly? You cannot be, because there are many different aspects, which influence the work of your hardware’.
Another example was Agile process – thereare iterations, sprints,during which our code evolves. We change a part of the code, we cut another part, would we do that, if our code was good? Here, we may say that this is the purpose of Agile, but would we change the code to fix bugs, which were there because we did not anticipate some situations; if it was good?
The lecture was very thought-provoking; there were no technical aspects. Natan drawn our attention to quite a significant thing
– if developer think that their code does not need any corrections, then they stop improving themselves and they stop to strive for perfection, which they will never be able to reach.
Ivar Grimstad – From Spring Framework to Java EE 7
Spring framework is an alternative to java applications of enterprise class. Spring was released in 2004, as an alternativeto EJB, which was imposing many other limitations.
During his presentation, Ivar Grimstad showed how to retrieve an application, which is completely based on Spring, to pure J2EE.
The speaker began his lecture from presenting a simple, finished Spring application. Next, he divided the whole process of conversion to a few steps – changes in entities, services, controllers, etc. Unfortunately, each step was more or less the same – presentation of Spring usage, switch to branch, on which there was already converted code, and the comparison of both solutions (it all boiled down to the comparison of annotation from both technologies). This part of the presentation was not interesting, because what has been presented, can be read in any, randomly chosen article, that presents the comparison of Spring and
After presenting the whole application converted to JEE, Ivar did not try to convince anyone that J2EE is better than Spring. He objectively presented when it is better to use J2EE than Spring. He referred, amongst others, to the knowledge of a specific technology against the background of the whole team. He stated that it is not worth changing the framework which is being used,
if we have a team that has extensive knowledge and experience within this framework.
The form of the lecture, as well as its content, has left a lot to be desired. The presentation would have been better, if Ivar would manually change the code instead of using prepared code branches. Additionally, he did not say about the consequences, disadvantages and advantages of introduced changes.
Paweł Benecki’s review
Daniel Ostrowski – Google BigQuery – uncovering programmers’ habits by interactive analysis of big datasets
The problem of analysing large datasets is not trivial. To do this, you need a huge computing power and advanced analysis algorithms. The key is the query execution time – the amount of time no longer than few hours is usually acceptable. As always – the sooner, the better.
Building your own infrastructure and buying Business Intelligence class software is a serious investment, which under certain conditions can be avoided thanks to external tools
Google provides a payable tool, BigQuery, to such uses in the scope of ‘cloud’ services. The product is integrated with Google Cloud Storage, which is used to store data. The project itself evolved from Google’s inner project used, e.g. when calculating PageRank.
In comparison with the other data storage systems, its physical organisation of data aimed at query performance is what attracts attention – data form single column is stored together, rather than and the usual solution, which is based on storing the whole table records. In such architecture it is hard to modify data, that is why, it has been decided that it will be a service used only to analyze the data, without the possibility of modification.
Possibilities of BigQuery were shown on the example of GitHub Timeline public dataset. The base contains over 130mln records concerning commits to GitHub’s public repositories.
The BigQuery queries are written in a specific SQL dialect, so any person, who knows this language, is basically able to use this tool right away.
Dawid dealt, among other things, with the analysis of programmers’ ‘happiness level’, depending on the programming language used in their projects. It was based on checking comments for commits for phrases like ‘hurray’, ‘hallelujah’, and words expressing more negative attitude.
It has to be mentioned that BigQuery cannot be used to analyse big amounts of data of any kind, it has to be possible to get them to the relational form.
Google declares full preservation of user’s data. The prices depend on the amount of processed data (more specifically, on columns used in a query), on the size of data in the Cloud Storage, and on the moment of execution of queries: queries burned on demand are more expensive than those queued to batch processing.
To the point, the lecturer seems to have a vast knowledge of the discussed topics.
Marek Rudziński – Mind software – how is it possible to reach the agreement between people who think differently
Very interesting, non-technical lecture, at the end of the first day. The lecture allowed to get rid of some heavy programming topics. Issues from the scope of psychology and interpersonal communication were raised. The aim of presentation was to show unaware metaprograms, used by most people on a daily basis.
After a short introduction, there was a film showing the conversation between the chairman of some company and the IT department director. Interestingly, such situation really took place. The chairman was not an actor – he played himself. Only the director was replaced by a consultant, but he was replaying a real situation. The film was interrupted from time to time, to comment on presented behaviours and to search for patterns.
The scene concerned a process of implementation of new IT system in a big company. The action starts when the chairman teels the director that he has received a huge outer grant on that project, so it can finally start. Here, the problems begin. Despite the fact that both interlocutors agreed that, in that case, the project should be done, whole conversation ended up with a quarrel, and finally the director got fired. The cause of the problems were different attitudes of both of them and they did not even want to try to credit the other person’s point of view.
Marek called the procedures of different aspects of life and communication a ‘metaprogram’. Which program is used by a given person is a result of many factors: inborn disposition, person’s experience, historical experiences of a group or a nation, or a cultural pattern, e.g. according to the lecturer, as Poles we are more likely to be individualists, searching for potential problems, and less likely to be team players, who notice the opportunities amongst us.
The same person may use different programs in different situations, but usually there is a dominating one.
During the lecture different metaprograms were shown, representing contrasting approaches in the following aspects:
1. Opportunities – problems:
– noticing only the opportunities in a given task;
– noticing only the problems.
– aim: ‘I will do it to make things better and it will present us with new possibilities’;
– problem: ‘We have to do this finally, because it is problematic’, and also ‘Let’s not change anything, if it has been effective enough until now’
3. Decision – source of beliefs:
– inner reference: ‘I know from my personal experience that (…) and this is the only right way to do this’, ‘I think I will do it my way’, not considering outer conditions that have changed;
– outer reference: considering others’ experience and information coming from outer conditions that have changed, even if your own experience was different.
4. The way of comparing and taking position on other people’s opinions:
– by elimination: using phrases like ‘it is impossible to agree with that’, ‘I do not say no’, looking for differences even if there are none
– by assimilation: „I agree with you”, looking for similarities even if there are none
Different approaches cannot be assessed in advance. To some extent, each approach is good in different situations. What is important is that you cannot constantly use only one of them
It gives bad results in case of a clash between two people who stick to their metaprograms, because revealed emotions obscure the content of discussed issues. Conscious familiarisation with interlocutor’s strategy and adaptation to some level can often lead to an agreement.
The form of the lecture was very interesting, it was very eloquent, and the lecturer’s radio-voice was extremely nice to listen.