Imediava's Blog

Just another WordPress.com site

Tag Archives: Groovy

Data binding on grails – The basics

Grails data binding is a simple tool that becomes really useful when having to assign values from request parameters to domain objects. Thanks to databinding assigning a whole bunch of properties can be done in one line of code:

Book book = new Book(params)

If the domain object already exists the equivalent is:

book.properties = params

If we have a domain object with many properties and with associations this can save loads of boilerplate code. As an example we are going to take a book domain class that has the following properties:

String name
String isbn
Date dateOfRelease
Person author

Without data-binding assigning all those properties would mean having to do:

book.name = params.name
book.isbn = params.isbn
book.dateOfRelease = new Date().parse("dd-MM-YYYY", params.dateOfRelease)

It is important to notice the obvious fact that if we want to bind the parameters of a web request (for example the result of a form submission) automatically, we need to use the same names in the fields sent through the form as in the domain class. However applying this simple convention that may as well be positive to preserve consistency, the process of gathering the result of a web request and create a new domain object can be simplified to one sentence. No need to set the parameters one by one.

In the case of one to one associations data binding is even more useful because it avoids having to create the domain objects. Let’s say our initial book class has an author property whose type is Person. If we didn’t have databinding associating this new property would mean having to do:

def author = new Person()
author.name = "John"
book.author = author

While with databinding if we had our parameters map with this content:

params ["author.name"] = "John"

The person object would be created with its name and it would be associated to the book.author property automatically.

In this first article we’ve seen how databinding can help avoiding having to write boilerplate code, simplifying notably the task of creating domain objects. In following episodes we’ll see how databinding works for many ended associations and other benefits of this approach such as converting automatically from string to the appropriate data type thanks to Spring’s propertyEditors.

Web Scraping with Groovy (3 of 3) – JSoup

In previous articles we’ve had a look at how to use Groovy [4] and Groovy + XPath [5] for scraping web pages. In the following one we are going to see how the JSoup library can make it even easier.

Jsoup

Jsoup is a very powerful Java library i have just recently discovered. As a Java library, it can be used with any JVM language, so we are going to use it with groovy thus benefiting from the features of both.

With Jsoup is really easy to fetch and parse an url, we just need to use one convenient method. The code to get the url for the example we’ve been using in the previous articles is as simple as this:

@Grapes( @Grab('org.jsoup:jsoup:1.6.1'))
Document doc = Jsoup.connect("http://www.bing.com/search?q=web+scraping").get();

We just define our dependency in the Jsoup library (thanks to grape) and then we call the method connect in the Jsoup class. This creates a Connection object whose parameters can be modified to allow things like setting cookies on it. After creating the Connection object calling it’s get method will actually retrieved the webpage, parse it as a DOM and return a Document object.

CSS selectors

JSoup’s most important feature is that it allows to use CSS selectors, a way to identify parts of a webpage that should be familiar to any JQuery or CSS user. CSS selectors are in my opinion the best existent way to filter elements in a web.

With the Document object we got before, the full code for filtering the links of interest for our example would be:

def results = doc.select("#results h3 a")

As you can see calling the select method we can use the same selector we would use with JQuery, what makes the query really easy.

To summarize i will show a summary of the advantages of Jsoup:

Summary

To sum up Jsoup is somewhat recent but comes with features that make it in my opinion the best Java library for web scraping. I recommend anyone with interest in scraping with Java to go to Jsoup’s page that is full of good examples of how to use the library.

Nonetheless, I encourage everyone to express your opinions about which one you think is the best Java library for web scraping.

Pros Cons
Simplifies URL fetching to the extreme (just one method.) XPath filtering is more standarized.
Facilitates the use of cookies.
Allows the of use “CSS” selectors known by any JQuery user.
In my opinion the best way to select an element or a list of elements in a webpage. (For other similar opinions see references [1] [2] [3])).

Links

Links to comparisons of XPath and CSS selectors:

[1] http://ejohn.org/blog/xpath-css-selectors/
[2] http://chrisfjay.blogspot.com/2007/08/css-and-xpath-selectors.html
[3] http://saucelabs.com/blog/index.php/2011/05/why-css-locators-are-the-way-to-go-vs-xpath/

Previous articles about web scraping with groovy:

[4] https://imediava.wordpress.com/2011/08/18/web-scraping-with-groovy-1-of-3/
[5] https://imediava.wordpress.com/2011/08/30/web-scraping-with-groovy-2-of-3/

Edited 22/10/2011: Grab with multiple named parameters has been replaced by the more concise version with only one parameter as suggested by Guillaume Laforge.

Web Scraping with Groovy 2 of 3 – XPath

In the previous article Web Scraping with Groovy 1/3 we talked about how we could use groovy features to make web scraping easy. In the following, we’ll exploit Java/Groovy interoperability using some additional Java libraries to simplify even further the process using XPath.

We are going to keep using the same practical example we used in the previous article that consisted of fetching ( http://www.bing.com/search?q=web+scraping ) and obtaining results titles that matched $(‘#results h3 a’) .

Web Scraping with XPath

URL fetching can be done exactly like in the previous article, however, parsing needs to be completely modified. The reason for that is that Java’s XPath support is prepared for DOM documents, nonetheless I still haven’t found any HTML DOM parser that can be used with Java XPath. On the other hand, there are many available HTML SAX parsers like the popular TagSoup which we already used in the first post.

After a considerable effort the only solution I have found is provided at Building a DOM with TagSoup. Adapted to our example the code looks like the following:


import org.ccil.cowan.tagsoup.Parser;
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import javax.xml.transform.*;
import javax.xml.xpath.*

def urlString = "http://www.bing.com/search?q=web+scraping"
URL url = new URL(urlString);

@Grapes( @Grab('org.ccil.cowan.tagsoup:tagsoup:1.2') )
XMLReader reader = new Parser();
//Transform SAX to DOM
reader.setFeature(Parser.namespacesFeature, false);
reader.setFeature(Parser.namespacePrefixesFeature, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMResult result = new DOMResult();
transformer.transform(new SAXSource(reader, new InputSource(url.openStream())), result);

With the parsed html we now can use XPath expressivity to filter elements in the web DOM. XPath allows better selection than GPath in a declarative way and benefiting from using a standard that can be ported to other programming languages easily. To select the same elements as in the first examples we will just need:

def xpath = XPathFactory.newInstance().newXPath()

//JQuery selector: $('#results h3 a')
def results = xpath.evaluate( '//*[@id=\'results\']//h3/a', result.getNode(), XPathConstants.NODESET )

Simulating the ‘#’ operator with XPath is quite complex compared with the simplicity of JQuery selectors. However XPath is powerful enough to express anything that can be expressed with them and it comes with its own advantages such as the possibility to select all elements that have a children of a specific type. For example:

'//p[a]' - // Selects all "p" elements that have an "a" element

That is something that is impossible to do with CSS selectors.

Summary

Pros Cons
Very powerful and capable of covering any filtering need Needs a hack to allow using html parsing with Java SDK XPath support
Less verbose than GPath It’s less prepared for html, what makes it more verbose than CSS selectors for operators like ‘#’ or ‘.’

Next

In the next article, the last of this series, I will talk about JSoup a library that I have just recently discovered but which offers in my opinion the best alternative. We will see not only how this library simplifies element filtering but also how it comes with additional features to make web scraping even easier.

Edited 22/10/2011: Grab with multiple named parameters has been replaced by the more concise version with only one parameter as suggested by Guillaume Laforge.

Web Scraping with Groovy (1 of 3)

Web Scraping

Web Scraping consists in extracting information from a webpage in an automatic way. It works from a combination of url fetching and html parsing. As an example for this article we are going to extract the main titles for the results of searching “web scraping” in Microsoft’s Bing.

As a reference for the article, searching “web search” with Bing is equivalent to accessing the following URL: http://www.bing.com/search?q=web+scraping

And the results’ titles are selected applying the following JQuery selector to the webpage’s DOM:

$('#results h3 a')

Scraping with Groovy

Groovy features make screen scraping easy. Url fetching in groovy makes use of Java
classes like java.net.URL yet it’s facilitated by Groovy’s additional methods, in this case withReader.

import org.ccil.cowan.tagsoup.Parser;
    
String ENCODING = "UTF-8"

@Grapes( @Grab('org.ccil.cowan.tagsoup:tagsoup:1.2') )       
def PARSER = new XmlSlurper(new Parser() )

def url = "http://www.bing.com/search?q=web+scraping"

new URL(url).withReader (ENCODING) { reader -> 

    def document = PARSER.parse(reader) 
    // Extracting information
}

Html parsing can be done with any of the many available html-parsing java tools like tagsoup or cyberneko. In this example we have used tagsoup and we can see how easy we declare our dependency on the library thanks to Grapes.

On top of that groovy’s xmlslurper and gpath allow to access specific parts of the parsed html in a convenient way. For the example of the article we would just need a line of code
to extract the titles of the search results:

//JQuery selector: $('#results h3 a')
//Example 1
document.'**'.find{ it['@id'] == 'results'}.ul.li.div.div.h3.a.each { println it.text() }
//Example 2
document.'**'.find{ it['@id'] == 'results'}.'**'.findAll{ it.name() == 'h3'}.a.each { println it.text() }

In the snippet I have provided two different ways of achieving the same goal.

For both examples we first use groovy’s ‘**’ to search for all document’s children in depth, this way we can find which one has as its id results.

Then for the first example we specify the full element path from the results element to the links that represent the titles. As we can see this is less handy than just saying “i want all h3 descendants” the way it is done with JQuery.

The second example does exactely that, using ‘**’ operator it asks for all elements of type h3. However, if we keep comparing it with the way it is done with JQuery we find the solution quite complex.

Summary

Pros Cons
Easy URL fetching thanks to withReader Verbose for filtering descendants at lower levels
Parsing simplyfied thanks to XmlSlurper and Grapes for declaring dependencies Filtering based on id, class or attributes is complex comparing it with (#,.,or [attribute=]) in JQuery

To Sum up, we have seen that web scraping is made easier thanks to Groovy. However it comes with some inconveniencies, above all if we compare it with how easy it is to select elements with JQuery selectors.

In my next post i’m going to explore other libraries that simplify element filtering providing support for things like XPath or even CSS selectors.

PS: This example’s code is really simple but it you still want to access it, it is available at this gist

PS2: This set of articles is now going to be three articles long. With the first dedicated to GPath, the seconde to XPath and the last to the most interesting of all of them in my opinion JSoup.

Edited 22/10/2011: Grab with multiple named parameters has been replaced by the more concise version with only one parameter as suggested by Guillaume Laforge.

Restore files from local history with Eclipse API

If you work with Eclipse you probably know that it keeps a copy of every modification made to a file. This is thanks to a backup system Eclipse guys call the Local history. This system is a really useful feature that can be a real lifesaver in some cases. It’s true that the copies are only kept for a relatively short period of time, but that can be configured by changing the Eclipse preferences.

However my interest is not to talk about its features, but to explain how the Local history system can be used when developing Eclipse plugins to implement undo capabilities for your actions.

Suppose your plugin has an action that modifies more than one file of an user’s project and you want to provide the user with the chance to undo that action. In that case, just by reverting every modified file to the state it had before the action’s date using the local history you would have an undo system ready. As simple as that. There is no need to develop the opposite action or to store manually the system state to go back to it.

An example of restoring a file to the it’s inmediately previous state in the local history is the following one coded with GroovyMonkey:

/*
 * Menu: Remove Markers
 * Script-Path: /GroovyMonkeyScripts/monkey/historial_ficheros.gm
 * Kudos: ERVIN
 * License: EPL 1.0
 * DOM: http://groovy-monkey.sourceforge.net/update/plugins/net.sf.groovyMonkey.dom
 */

import org.eclipse.core.resources.*
import org.eclipse.jdt.core.JavaCore
import org.eclipse.jdt.core.IPackageFragmentRoot
import org.eclipse.core.runtime.Path

workspace.root.projects.each { project ->

   //selects only the java projects
   if (project.isOpen() && project.isNatureEnabled("org.eclipse.jdt.core.javanature")){
 	   javaProject = JavaCore.create(project)
	   roots = javaProject.getPackageFragments()
	   
	   // Filters the source code packages
	   .findAll{ fragment -> fragment.getKind() == IPackageFragmentRoot.K_SOURCE}
	   
	   .each { fragment ->
	 	    fragment.compilationUnits.each {
			    file = it.resource
			    file.setContents(file.getHistory(null)[0].
                                         contents,IFile.KEEP_HISTORY, null)
			    file.refreshLocal(IResource.DEPTH_INFINITE, null)
	 	    }
		
	   }
	 
   }
}

Basically what this code does is: it takes all the open Java Projects in Eclipse, and restore the content of every source code file to the last state in their history.

The most interesting part of the snippet are the two following lines:


file.setContents(file.getHistory(null)[0].
                       contents,IFile.KEEP_HISTORY, null)

file.refreshLocal(IResource.DEPTH_INFINITE, null)


The first line takes the first element from the file’s history. Since the getHistory() method returns an ordered array, the first element corresponds to the last state of the file in it’s history. Then it takes the last state’s content and assigns it to the file, thus restoring the file’s content to it’s previous state.

The second line is just in charge of making Eclipse aware of the file’s change.

Prototyping SWT with GroovyMonkey

As a part of an Eclipse plugin I’m developing for my Master’s degree project lately I’ve had to deal with SWT.  Coming from some background at developing GUIs with Swing and Windows Forms I have to admit that SWT it’s not easy to learn. Apart from a bit counterintuitive I find it too verbose.

So far, whenever I wanted to develop a new Wizard or View I had to code it and play with layouts and widgets till I got the look I wanted.

On the other hand, just quite recently I’ve discovered that there’s an scripting environment for Eclipse called GroovyMonkey that can help the task of protoyping with SWT.

To play with it, I’m gonna use an example to compare the code you need to write to create a really simple interface with both approaches to show GroovyMonkey benefits over Java SWT. The example I’m gonna use is extracted from the installation of GroovyMonkey in Eclipse. We’re gonna create the following shell:

Example Shell

The code for this shell with SWT and Java is quite long for what it does, not too elegant and prone to error.

@Override
public void createPartControl(final Composite parent) {

     parent.setLayout(new GridLayout());
     group = new Group(parent, SWT.NONE);
     group.setText("Groovy SWT");
     group.setBackground(parent.getShell().getDisplay().getSystemColor(SWT.COLOR_WHITE));
     group.setLayout(new GridLayout());

     Label label1 = new Label(group, SWT.NONE);
     label1.setText("groove fun !" );
     label1.setBackground(parent.getShell().getDisplay().getSystemColor(SWT.COLOR_WHITE));

     Label label2 = new Label(group, SWT.NONE);
     label2.setText("Email: ckl@dacelo.nl");
     label2.setBackground(parent.getShell().getDisplay().getSystemColor(SWT.COLOR_WHITE));

}

The code we would need with GroovyMonkey is shorter and simpler to understand. Advantages whose impact grows with more complex user interfaces.

 def subapp = jface.shell( window.getShell())
 {
 	gridLayout()
    group( text:"Groovy SWT", background:[255, 255, 255] )
    {
    	gridLayout()
        label( text:"groove fun !" ,background:[255, 255, 255] )
        label( text:"Email: ckl@dacelo.nl", background:[255, 255, 255] )
    }
}

So, those features have made me decide to give GrooveMonkey a try as my tool for quick GUI prototyping with SWT. In the following section, I’m gonna share my experience through a small list of tips I myself find useful.

Since I’m a completely beginner with GrooveMonkey and with Groovy in general, some of the tips may seem extremely obvious for somewhat experienced users. Nonetheless It haven’t been easy for me to find information about how to start with GroovyMonkey. Thus, this article is written with the hope that it can somehow be useful to those like me who have never used GroovyMonkey before and need help in their very first steps.

Tips

Check GroovySWT documentation

This applies to you specially if you’re using GroovyMonkey for prototyping SWT GUIs like I’m doing.
GroovySWT webpage provides a short explanation of the way the library works. If you wanna dig deeper the examples it provides should become your best source.

How to assign values to fields

To assign a value to a field you can take to approaches: either you use the classical “set” method provided by the Java API or you pass the value as a named parameter to the object constructor.

The first way is more flexible since it allows to change the value of the object at any point of the execution. The code to use this approach for setting the text of a label example shell is similar to the Java code:

   miLabel.setText("groove fun !")

On the other hand, the second way is more concise and it’s the preferable way to assign values when creating objects.

   text(text : "groove fun !")

It’s important to point out that this way of assigning values to properties can be used for any property which follows the naming convention marked for JavaBeans. As an illustration we are gonna use the same approach to assign layout data to a control. This can be done since the method setLayoutData exists for any control. The following snippet shows how to do it.

import org.eclipse.swt.layout.GridData

text(style: "Border", layoutData: gridData(grabExcessHorizontalSpace : true, horizontalAlignment : GridData.FILL, verticalAlignment : GridData.FILL))

The last example also shows that this approach can be used recursively, this way increasing it’s benefits.

Styles for controls

The way styles are asigned to controls is also changed with GroovyMonkey.

GroovyMonkey takes the SWT.None as a default value that doesn’t need to be expressed to be assigned to a control.

When other style values need to be assigned the way to do it is passing them through the named parameter style. The value for this parameter is a string which contains the list of styles we wanna set, separated with commas. To represent every style we need to use it’s name deprived from the SWT prefix. Uppercases are ignored so any representation with the same letters as the style name is accepted by GroovyMonkey.

As an example this is the way to create a mulitilined text field with border:


text ( style: 'Border, Multi')

So that’s all for now. Maybe I’ll come back with more tips for GroovyMonkey users when I have more experience with it.

%d bloggers like this: