Relationships as "baggage", or "baggage" isn't always a bad thing
One of the basis of social network analysis is the idea that people have connections between them. These connections can be anything from frequency of interaction or asking advice, to liking, friendship or joint membership in groups or organizations. Whereas these observed links have been thoroughly scrutinized, little attention has been paid to the reason those links exist in the first place. Why they are there, how long they are likely to persist and what forces act on these links, forcing them to die or to come into being. This project proposes a generative model of relationships - these relationships being friendships, collaborations or other interactions that depend on factors inherent to actors and on context within which they exist. This model also considers the fact that context is likely to change and such changes can have a strong effect on existing relationships and probability of new ones. This is an iterative model of the evolution of ties - their birth, their maintenance and their death, conditional on geographical proximity and other factors of similarity.
What if for each iteration, you can have one of several events happen. Some events happen with high probability - i.e. reinforcement of ties while others happen with lower probability - new ties with geographically proximal partners that are rated highly similar on the similarity metric. Others happen with really low probability - new ties with non-geographically proximal partners, or proximal partners that are low on similarity.
Then we need to add shocks to the system - that is, at each iteration you have a check - geographical position of actor compared to prior iteration. You have low probability of change but if change exists, then your probabilities on that set of potential events changes at first, then comes back to normal over time (harmonic frequencies?).
So for each iteration, you can have one of several events happen. Some events happen with high probability - i.e. reinforcement of ties while others happen with lower probability - new ties with geographically proximal partners that are rated highly similar on the similarity metric. Others happen with really low probability - new ties with non-geographically proximal partners, or proximal partners that are low on similarity.
The idea is relatively simple - in each geographical location people meet other people and develop sets of personal relationships. As they move, there is a selection process at work. Movers keep some relationships and loose others at various rates of decay. As movers arrive in the new location, the process of engaging in new social relationships (and work relationships) takes time and effort. Considering that many researchers are not of the most extroverted kind, the process of meeting new people and initiating new collaborations can take a while. So for a while after a move, most collaborations happen long distance with pre-established ties. However, as geographically proximal relationships develop, the frequency of interaction or collaboration with long distance contacts slowly decreases until most of those relationships become dormant and only very few persist.
As people move, the story repeats itself. Over time, one can observe researchers with a huge range of active and dormant ties. Like a spiderweb, their network covers large distances, but only a few of the strands are ever really active professionaly (probably more so personally though). Many of these ties get renewed or at least maintained during conferences or workshop meetings, so that some dormant ties can come into a more active state of collaboration at times. As researchers mature, their range of interests branches out and so does their range of influence. They develop more relationships with their students than with other faculty and continue those collaborations for longer periods of time. Yet even those relationships decay and return to a dormant stage.
One way to observe these processes is to consider a citation database of a particular field, and to match a researcher's geographical location with each citation - this would be a way to monitor geographic movement. Computer Science databases would be better for this purpose because their publication turn around is much quicker and the volume is larger than in social sciences or humanities. The data would have researchers as nodes, their geographical location as a weight on each "co-authorship" - that is, geographically proximate links will have a different value than those that are geographically distant (because proximate links would be easier to maintain). Another attribute of the links would be the number of articles co-authored together to date (regardless of geographical proximity). These networks would be a dynamic system with time as a defining variable, with network snapshots taken in 1-year intervals.
There are two things to discern in such a network -
1. to see whether the model described above fits the data
2. to identify the "connectors" in the network - people who may not publish very much themselves, but who foster a lot of collaborations between others through putting them together on the same paper once or twice.
How:
1. data collection - crawl the ACM database
2. analysis with ORA (although the generative model can be implemented in R first)
