Profile
Blog
Photos
Videos
Ok, just finished my second quarterly report. Below I include a number of extracts by way of an update, including some coverage of the differences between analysing census and sample data. I also detail the 5th Phnom Penh Mapping Meetup. And, the rains have come! It's been bucketing down for a solid 40 minutes. Time to trade the bike in for a boat...
Names are anonymised according to the following convention: male's name = Jim n, female's name = Jill n.
Review of Placement Objectives
Improve capability of staff to manage and present spatial data
Similar to the first quarterly report, this objective has not changed since the development of this work plan. The objective is reasonable because of the data management issues at STT. The objective is achievable, especially now that we have a server and because I am going through a month's review of its functioning with me as the main user. Now that we have conducted a workshop to establish the GIS folder on the server, finalise a GIS job folder structure and remind them of STT's file naming convention (YYYYMMDD_staff-name_project-name_file-name_vs#.extension), staff are beginning to become familiar about managing data from a central location and handling data using a standardised process. Staff have been only partly receptive this quarter toward my ideas about improved data management practices. There was some reluctance displayed during the workshop when I talked about using the server for everyday processing. This is due to the threats of power failures, inadequate disk space and failure to back up data. Now that we have addressed the first, primary threat with a UPS and generator, reluctance is reducing. The disk space problem can be addressed with a larger drive (although the current one is adequate at 1TB and there is another drive bay not yet used. Given that internal drives up to 2TB in size are available, 4TB of space is achievable, afterwhich we can use external, compressed archive drives for old data if necessary). Another cause for people recognising the validity of use of the server for everyday processing, rather than just as an archive drive for final data, is that I have demonstrated how it might be used for a shared Excel spreadsheet and for sharing other files, such as SPSS outputs, easily instead of zipping and attaching them to emails. As mentioned before, two previous volunteers have suggested the use of a server so it is great that this big step in improving the management of data has taken place. In combination with the equally important standards for data handling (standard folder structure, standard file naming convention), it will cause a significant efficiency gain for the organisation.
Improve capability of staff to perform statistical analyses
Similar to the first quarterly report, this objective has not changed since the development of this work plan. The objective is reasonable because of the poor levels of exploitation of their data at present and the capability of at least some staff members to comprehend intermediate levels of statistical analysis. In the last report it was mentioned that statistics workshops were required to achieve the objective. A review with managment, however, revealed that they were happy for me to work with staff on an individual basis. As mentioned above, this has led to significant gains in the statistical analysis capacity of Jim 1 through my working with him on the relocation sites census. Progress toward the objective is therefore occurring at a good pace, although will only be maintained by my moving on to the capacity development of other team members, namely Jill 1 and Jim 1 (as requested by management). Whilst I wanted initially to run workshops about statistics, managers have displayed little interest in this occurring. I am still willing to run them, although working with individuals has also proven to increase their capacity.
For the past three weeks I have been increasing Jim 1's capacity to perform descriptive and inferential statistics for both sample and census datasets. I must also comment, however, that this has involved my own capacity development first. I made a mistake in initially recommending statistical analyses suited to sample datasets when we were actually using a census dataset of all 53 relocation sites. This is because my psych and GIS background mainly deals with sample data (imagine trying to do a census of all who had histrionic personality disorder in the world) so I initially did not think to adjust my methods. I should say, however, that a suspicion had been lurking in the back of my mind since I started the placement that sample statistics may be the wrong way to go here, I guess there was just too much on for me to pay attention to this inkling until recently. Surprisingly, it proved very difficult to find resources via Google adequately describing how to go about analysing census datasets. All that I found was this post on the Talk Stats forum. I also found a useful Stack Exchange forum, Cross Validated, where a post I made brought some more advanced discussion. I should credit a friend in Phnom Penh who was a huge help understanding all of this by putting it in simple terms:
When you have a sample you use inferential stats to generalise to the
population. When you have a census you already have data for the whole
population, so there is no need to generalise.
For example, if you used sampling, and there is a 3% difference between
groups, then you have to use inferential stats to decide whether that 3%
difference is real, or just due to random chance when you did the
sampling.
But if you did a census, and there is a 3% difference between groups,
well, then there's definitely a 3% difference. That 3% difference is not
due to random chance in sampling, because you have data for the whole
population. However, even with a census you will still need to use your
own judgement to think about why there is a 3% difference (for reasons
other than random chance in sampling), and whether the 3% difference is
large enough to have any practical significance for the work you are
doing.
So basically, just use descriptive stats. Correlations are fine, but you
only need the r value to show the strength of the correlation, not the p
value which is related to random chance in sampling.
A lot of people don't get the difference between sample stats and
census stats, and will complain that you didn't do the stats properly.
I've had cases where I ended up having to do inferential stats on census
data just because people complained so much that there were no p values
on anything!
If you have a lot of missing data from a census sometimes you need some
fancy inferential stats to fill it in. I doubt this will apply to you,
but it does apply to the US population census because (for some bizarre
libertarian reason) completing the census survey in not mandatory in the
US.
This misunderstanding about census vs sample stats meant that about a week of effort training of Jim 1 in how to prepare data for, and run, inferential tests such as ANOVAs, Kruskall-Wallis tests and correlations was partly wasted. Partly, because analytic skills for sample datasets will be useful for other work at the NGO in future and for his own career development. Another reason why all was not lost is that, as mentioned above, correlations still apply to census datasets, one simply does not have to report the p value since this is an indicator of how likely the sample is to represent a random effect rather than something within the population. I also learned that descriptive stats (and related graphs) make up more of the analyses for census data. We finished these analyses last Friday, so it was a productive week.
Improve capability of staff to create and maintain databases
Similar to the first quarterly report, this objective has not changed since the development of this work plan. It is reasonable given that the NGO does not have a database yet has a great deal of data that they have collected from field surveys. Such a database should have been the first thing that the NGO created when they were founded and began doing surveys. A whole placement could be devoted to marshalling their data (in fact, sometimes I wish there were three of me, one for each of the research, advocacy and mapping teams!), but I am confident that I can impart enough skills so that we can get started and they can complete it. We have now started the process of developing a database by creating the entity relationship diagram. Other significant groundwork has been completed, such as finalising the eviction site dataset. Now we have finalised all of the main datasets for the database, we can create the relation schemas and import the records into the database. We have only started, but not completed, one final relation, however. This the one necessary for the many to many relationship between the evictions relation and the relocation sites relation. This is because families of some eviction sites went to multiple relocation sites and vice versa. Staff are somewhat unsure about the this aspect of eviction sites, but it is good to make a start by putting down the relationships between sites that we know of, and improving it in future. Now that the eviction site and relocation site datasets are finalised, we can finish this relation at the beginning of the third quarter.
I was happy to find out during a discussion with the previous EWB volunteer that he had run a day's workshop on setting up a database and linking it to a GIS program called Quantum GIS. I hope to discuss this more with him so I can remind them of what he taught and try to keep going from where he left off. I'm thinking few of them recall it though, as no one mentioned it before at the organisation which makes me doubt the file is still around. +1 more for having a server!
Improve capability of staff to administer and update the UrbanVoice (UV) webmap
Similar to the first quarterly report, this objective has not changed since the development of this work plan. It is a reasonable objective given that staff have been exposed to UV for at least a year now and have developed some confidence about its operation and why we have it. Improvement in staff capability to administer and update the site has already occurred due to my capacity building efforts. Now that Jim 2 has come on board, development of Phase II of the web map can take place. His expertise in Javascript, PHP and MySQL mean that many of Jim 3's queries about improving the code can be answered. So far we have had one session with Jim 2 which involved a discussion of the Phase II document, developed by myself and Jim 3 in the first quarter, and preliminary investigations of the code for the site, mainly focusing on how to finish off the conversion to Khmer script. Despite his only being able to commit 5 hours per week, this is fine so long as Jim 3 and I are mindful to develop a list of questions each week before he comes so that we can use his time effectively. This practice of developing such a list is already in place when we were working with Web Development Firm 1, so I am confident we can get as much out of Jim 2 as we need to. I should credit Jim 3 for making significant progress so far toward the primary goal of Phase II - putting the site in Khmer, he's almost done.
Phnom Penh Mapping Meetup 5
There were three presentations at this meetup - Tim Coulas from the Cambodian Land Administration Support Project presented about the huge systematic land registration campaign that has been underway in Cambodia for the past 13 or so years and is set to go for about another 20. This involved a video about the process and then a lengthy Q&A session. About 15 attended, half Khmer including 4 from my NGO. It was good to have Tim field some questions from my counterparts, for example Jim 4 had been surveying plots of land at an Udong relocation site that day as part of our efforts to increase their land tenure security. We both asked him questions about how effective these efforts would be and he stated that it would be best to wait for the proper systematic process to take place (there about 1000 on the surveying team operating in a number of provinces at the moment). He stated that the best way that NGOs can support the process is to run workshops informing people about the documentation they need to provide on the day that the survey team comes to define their property boundaries and resolve disputes. It would also be good for us to help people to obtain/print/organise this documentation if they do not already have it. Otherwise if there is a dispute they may not have adequate evidence to support their claim, or have to go through the ad hoc land registration process which costs 100s of dollars as opposed to about $3 for a land tenure document via the systematic process.
Tim's presentation was followed by Paul Gager from Aruna presenting about the aerial bombing of Cambodia during the Vietnam War (Prezi presentation, MangoMap of the dataset). This was a very important dataset to see. Ian Thomas, a veteran of the spatial industry in SE Asia who has done a lot of work on the US bombing datasets of Vietnam, Cambodia and Laos, was also in attendance and told us some more very interesting facts afterward. He advised that one way to understand the number of bombs dropped on Cambodia is that if they were all put on a train that started in Zurich, the train would be so long as to extend into Siberia. Another way to characterise it is that $US20b of bombs was dropped. This is 10x the NASA moonshot budget at the time. As pointed out on the Mango Maps blog post about the dataset, this extraordinary expenditure and horrendous destruction of a people and nation actually did not satisfy the broader strategic objective very well. Zoom into Phnom Penh on the map, you'll notice something interesting.
Last of all we had our first Khmer presenter!! Jim 5, member of our advocacy team, presented about UrbanVoiceCambodia.net. It was a moment of great pride for me that we had finally had a local present and that it was about the work of the NGO. He was hoping to give the presentation in Khmer but there were not enough in the audience to justify it. Hopefully next time we can have a presentation in Khmer, by a Khmer.
OpenStreetMap 12hr Update Marathon 1
A friend from LICADHO and I came up with the idea of doing a marathon update session for OpenStreetMap. This is because, on this map, Cambodia is in need of the following:
- district and province boundaries,
- protected areas,
- revision of roads classification in Phnom Penh (too many primary and secondary roads),
- deletion of duplicate roads southwest of Wat Phnom,
- addition of roads to Sihanoukville,
- adding of railway line to Sihanoukville and Poipet,
- addition of building footprints to Phnom Penh, and
- the OSM Cambodia wiki needs updating!
Making trouble in Phnom Penh
Group moto trip to Mondulkiri (Sen Monorom) and Kraitie planned for four days next weekend. Should be good! I also was quoted in the Phnom Penh Post for the first time! My name isn't mentioned, but I'm the one who liked the steak.
- comments