Monday, July 15, 2013

Script: Scatter.gs; Easily create scatter plots with regression lines in GrADS

This script not only makes it possible, but easy to plot 2D scatter or line plots in GrADS using user specified arrays. That is, there is no requirement than an outside file be open, and the 'display' command is not needed or used.  Now, for most things, it is probably easier to use matlab, python, idl or even excel to do graphs like these.  However, I can certainly see the advantages of this script if you are accessing large data sets right through the NOMADS data server, or the GrADS data server.

Example usage:

     xarray='1 2 3 4 5 6 7 8 9 10'
     yarray='1 2 3 4 5 6 7 8 9 10' 
     'scatter -x 'xarray' -y 'yarray

Depending on the options used, your plot will look similar to the one below.

Fig.1: Example Scatter Plot, With regression line and equation

Note: you must include both the -x and -y to specify your arrays, and your arrays must take a form similar to the ones listed above.

There are numerous options to customize your plot, they are listed in the script under "default options at the top."  Additionally, most of these options can be set manually when calling the function.

To find out more about these options, use the help page associated with the script:

    'scatter -help'

The main example I will show here, is basically, taking the time-series surface temperature and the surface RH at a given point, and then making a scatter plot of the two variables.

As always, we will start by opening the file:
 
    'sdfopen http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/gfs_hd20130715/gfs_hd_00z'

Once the file is open, we clear the screen and pick a point and then set gxout to print, since are are just going to populate our arrays with numbers.
 
      'clear'
      'set lat 40'
      'set lon -90'
      'set gxout print'



Now, we need to start populating our arrays!  This isn't too hard we just set our initial arrays = '' (blank), and then loop through all time steps (65 for GFS) and concatenate as we go, being sure to leave a space between each new array variable.

    t=1
    xarray=''
    yarray=''

    while(t<=65)
      'set t 't


      'd tmp2m'
      tmp=sublin(result,2)
      tmp=subwrd(tmp,1)-273
      xarray=xarray' 'tmp

      'd rh2m'
      tmp1=sublin(result,2)
      tmp1=subwrd(tmp1,1

      yarray=yarray' 'tmp1

      t=t+1
   endwhile


Now, we have our arrays, so all we need to do is call scatter...

'scatter -x 'xarray' -y 'yarray' -xmin 15 -ymin 50 -ymax 100 -xmax 35 -xpg 1.5 9.5 -ypg 1 7.4 -box'

So, as you can probably see, this call uses several of the options available in the script.

    -xmin: Manually sets minimum x-axis value
    -ymin: Manually sets minimum y-axis value
    -xmax: Manually sets maximum x-axis value
    -ymax: Manually sets maximum y-axis value
    -xpg: Sets page limits in x
    -ypg: Sets page limits in y
    -box: Draws box around plot, instead of two axis as shown in Fig.1.

This will plot something similar to the plot below.  You will be prompted in the script to place the equation, and type in the titles (unless you have default titles set).

Fig.2: Surface Temp vs Relative Humidity, with regression shown
This plot appears more or less as you might expect, RH is inversely proportional to temperature (Think Clausius), and while this relationship isn't physically linear, you still get a decent correlation here.

Remember, You will be prompted to place the equation and your titles, so be sure to read the console as you work with this script!

Lastly, there are several options that exert varying degrees of control over your plot (just a few are listed above), including number of ticks on each axis, font/symbol/line color and style, labels, etc., etc.,  So be sure to bring up the help page to get an idea of how to set different options.  Most of the time, this type of analysis is best left to matlab, or something, if you are used to GrADS, this might cut out having to save data to an outside file just to do this.  Who knows, hopefully you all find it useful!

Remember, this is version 1.0, so bugs are certainly possible, if not likely, please report them here!

Download Scatter.gs

Download example script


8 comments:

  1. HI there, I was studying the precipitation over the coast in Peru and found that the sea temperature over the Pacific ecuatorial basin and the precipitation have an exponencial distribution in 30 year-period of time used for the study so, I would like to know if there's possible to use some script or function that let me draw this distribution spacially, like your example. I hope you could help with it. Thanks in advance.

    ReplyDelete
    Replies
    1. I unfortunately do not have any scripts put together that try to do anything beyond simple linear regression ... that sounds like a bit too much math for me. However, if it is an exponential relationship, you could always try taking the natural log of one of your variables. This would be make your exponential relationship linear.

      Delete
  2. Hi, I am trying to plot azimuthally averaged vertical structure of a cyclone. I read the parameter value at every individual point, calculate the azimuthal average and write to a text file along with the co-ordinates(radius,height). This text file is read in matlab and plotted(shaded plot with x-axes depicting radial distance and Y-axes depicting height). Is there a way to do this in grads itself ? could you please help me in this regard.

    ReplyDelete
  3. Hi,

    I tried to use your script. It shows this error:
    ga-> run test_file.gs
    ---------------------------------
    You have not included two arrays.
    Exiting Script...
    For help type: "scatter -help"
    ---------------------------------
    Unknown command: scatter

    I have two variables in my control file: temperature and precipitation for 45 time steps.

    This is my test_file.gs

    'xdfopen ./gddPCP.ctl'
    'set lat 25'
    'set lon -90'
    'set gxout print'
    t=1
    xarray=''
    yarray=''

    while(t<=45)
    'set t 't
    'd tasmin'

    tmp=sublin(result,2)
    tmp=subwrd(tmp,1)

    xarray=xarray' 'tmp

    'd pr'
    tmp1=sublin(result,2)
    tmp1=subwrd(tmp1,1)

    yarray=yarray' 'tmp1

    t=t+1
    endwhile

    'run /research-home/scatter.gs'
    'scatter -x 'xarray' -y 'yarray' -xmin 200 -ymin 0 -ymax 20 -xm
    ax 1200 -xpg 1.5 9.5 -ypg 1 7.4 -box'

    Do you have any suggestion ?

    Sincerely

    ReplyDelete
    Replies
    1. I think you are just calling it wrong. I think you need to combine the following lines into a single line:

      'run /research-home/scatter.gs'
      'scatter -x 'xarray' -y 'yarray' -xmin 200 -ymin 0 -ymax 20 -xm

      'run /research-home/scatter -x 'xarray' -y 'yarray' -xmin 200 -ymin 0 -ymax 20 -xm

      Good luck!

      Delete
  4. Hi There,
    Thank you for the earlier reply. Can you please tell me how to change the color and shape of the scatter marks ? I am trying to plot different datasets in one plot. It would be easier to use different color of scatter marks and regression line. Where in the script do I include these command ? For example, in the above code where do I insert the options ?
    'run /research-home/scatter -x 'xarray' -y 'yarray' -xmin 200 -ymin 0 -ymax 20 -xm

    ReplyDelete
  5. Hi,

    Thank you for the script. I am using this. But I have a problem:

    When I try to plot the regression line without min and max values for the x and y axes t is fine. However, when I try to plot the regression line, with the min and max axes values, as suggested by you (see below), the regression line gets plotted outside of the box. How do I get the line inside the box ?

    The call I used is:
    'run ../scatter -x 'xarray' -y 'yarray' -xmin 200 -ymin 2 -ymax 10 -xmax 1050 -box -sym 4 -symcol 4 -rlinecol 4'

    ReplyDelete
  6. Choosing the right data visualization tool is very important for you to communicate your real-time information clearly. There are many complex visualizations like Scatter Plot, Sankey Diagram, Likert Chart & Pareto Chart but the Scatter Plot is the best to use when finding correlation between two variables.

    Read more here:
    https://blog.zumvu.com/scatter-plot-for-data-visualization/ .

    ReplyDelete