Almost two and a half years ago, I created an account on a Chinese question-and-answer website and started to answer some questions regarding linguistics and languages raised by laymen Chinese netizens. It is my pleasure to introduce linguistics to the public in Chinese – after all, “modern western linguistics” itself is a relatively new concept to teenagers and young adults, and some people are curious about that. Although most of the questions I came across are rather basic and some not very scientific (for example, someone may question me if Chinese really has grammar, which is not a very good question from my point of view), I am more or less satisfied if people can know more about the scientific studies of different languages, and even decide to take a course in linguistics.
However, I constantly receive some criticism from certain audiences, not about the content of my answer, but about the way in which I deliver it, or, to be more specific, about the languages I use when I talk about those topics. They told me politely or impolitely that they were “annoyed” by the use of English terminologies in my answers, and by putting English words in a Chinese article, I was “damaging the beauty, purity and integrity of Chinese”. Someone even “threatened” me that he would give up reading my answers because he cannot stand the English words in them. Before I started this blog, I did a calculation on my ten most recent answers to linguistic questions, and found that over 90% of the text was written in Chinese, though I did use a lot of English terminology when I talked about some linguistic theories.
I am going to protest my innocence here – I really do not choose to do so. I received most, if not all, of the content related to linguistics in English, and currently I see English as my major working language. The influence of English is so deep-rooted that whenever I would like to refer to a ready-made concept in linguistics, such as “psychotypology” (see my previous post for more details), “telicity”, or “aspect marker”, the first words that pop up in my mind will be in English, and I barely know their translation equivalents in Chinese. Therefore, almost every time I intend to introduce a new concept to my audience or refer to the key research methodology, I will put it in English first and try my best to give an appropriate Chinese translation, while the rest of my answer is in Chinese.
Maybe you have already realised that this phenomenon is called code-mixing or code-switching. In sociolinguistics and second language acquisition, it is a long-standing topic and has been investigated from various perspectives, including the roles of the matrix language (the language that forms the grammatical structure of a chunk of utterance) and the embedded language (the language that provides the “mixing” words in the chunk), the social identity of code-switchers and opinions from the surrounding, and other possible areas. I am particularly interested in the word types, the pedagogical methods and the linguistic code appearing in code-switching. According to my personal observation of myself and friends with similar sociolinguistic background, whether you switch the code to the filler language depends on two different factors. The first one is what type the word or phrase is, namely content words (nouns, verbs, adjectives, etc) or grammatical words (pronouns, prepositions, conjunctions, etc). The Differential Access Hypothesis, which is a well-discussed framework of code-switching, assumes that content words and morphemes appear more frequently in the embedded language while grammatical morphemes tend to appear in the matrix language (Myers-Scotton 2005), and a number of finely recorded or anecdotal data can support this hypothesis.
More importantly, the use of code-switching is somehow related to the environment and method you acquire the word by – actually, it is related to how you add the new word in the second language to your “mental all-language dictionary” – the multilingual lexicon. We assume that we store the vocabulary we know in different languages in a unified structure, and a simple illustration of a possible bilingual lexicon, which is developed by Kroll and Stewart (1994), is listed here.
This proposal of the bilingual lexicon provides a vast amount of information for us to raise different hypotheses about how bilinguals memorise and use words. We can also get some inspiration from it to explain the choice of code-switching. For sequential bilinguals like me, namely people who start to acquire the second language after mastering their first language fluently, we start learning words of a second language by matching the words to the translation equivalents. For instance, when I first learnt the word “apple” (which is among my first English words), I did not establish the link directly between the the English word “apple” and the round red juicy fruit, but made an interchange at the Chinese word “pingguo”, since I learnt the word via Chinese translation and I needed to rely on the Chinese word to retrieve the English word at an early stage of acquisition. That is the reason that the link from one’s second language to one’s first language is much stronger than the link between the concept and one’s second language. In that situation, since we get a stronger connection from the concept to the first language than to the second language, we will continue to use the first language words in the utterance. That is applicable to some content words and most grammatical words, which can also explain why we always use our first language as the matrix language of code-switching.
Things become different when we acquire a new word in the use of a second language. This includes some second language immersion programmes (you can see a lot in Cambridge every summer), using the second language as the working / teaching language (which is the case when I was in Hong Kong and the UK), and other environments in which you do not always use your first language. In that situation, we can directly build up the link between the form of a second language word and the concept it represents, and there is no need to refer to the first language vocabulary anymore. When we retrieve these words and examine the connections between concepts and word forms, we will find that the second language words are more readily used, and we are tempted to use them even if the rest of the utterance is organised in another language. Linguistic terminology in English is a good example for me: the imbalanced bilingual lexicon drives me to use English words when I think of the concepts, because that is the easiest way to do so.
Code-switching is definitely more complicated than word switching, but I just want to present an alternative viewpoint to the whole picture. Maybe the language purists simply cannot understand the use of two different languages at the same time, and they can only attribute such things to showing off, but I believe that the motive is more cognitive. Previous investigation of code-switching indicates that code-switching always comes with additional cost: code-switchers need more time to prepare for articulation (or typing, of course), and they may need to activate the words in the other languages and adjust the structure of sentences as well (see Meuter and Allport 1999; Meuter 2005). Nevertheless, my friends and I do code-switching all the time, and we do it for communicative purposes: if the English word can better deliver the intended message, why not use it in an utterance? After all, as intelligent animals, humans will not do anything without considering its convenience, and language use is no exception.
For more details, please check the following items:
Kroll, J. F., and E. Stewart. 1994. ‘Category Interference in Translation and Picture Naming: Evidence for Asymmetric Connections Between Bilingual Memory Representations’, Journal of Memory and Language, 33.2: 149–74
Meuter, Renata F. I. 2005. ‘Language Selection in Bilinguals’, in Handbook of Bilingualism: Psycholinguistic Approaches, ed. by J. F. Kroll and A. M. B. de (New York, NY, US: Oxford University Press), pp. 349–70
Meuter, Renata F. I., and Alan Allport. 1999. ‘Bilingual Language Switching in Naming: Asymmetrical Costs of Language Selection’, Journal of Memory and Language, 40.1: 25–40 <http://dx.doi.org/10.1006/jmla.1998.2602>
Myers-Scotton, Carol. 2005. ‘Supporting a Differential Access Hypothesis: Code Switching and Other Contact Data’, in Handbook of Bilingualism: Psycholinguistic Approaches, ed. by J. F. Kroll and A. M. B. de (New York, NY, US: Oxford University Press), pp. 326–48
In modern language sociolinguistics we are often interested in investigating the speech of specific social groups. We might compare the speech of people from different ethnic groups, or different socio-economic classes or genders. Alternatively, we might investigate differences in language use in different contexts. How do people use language differently in formal contexts like job interviews as compared with informal contexts, like chatting with friends in the pub?
In either case, the first step is to collect data: to record language use by the different groups of people we’re interested in, or in the different contexts we’re interested in. But what can we do when that’s impossible? When we’re investigating historical languages, we’re limited to whatever language happens to have been written down and whichever bits of writing happen to have survived until the present. That’s normally quite a skewed sample in lots of ways. In many historical periods only certain social groups (typically wealthy, powerful men, often particularly those associated with the church) learned to read and write, and so only those social groups leave a written record of their language. Furthermore, language was only written down in certain contexts: records of laws and legal proceedings, religious writings, financial transactions and perhaps narrative literature and poetry. Unlike with today’s social media, casual everyday interactions did not take place in writing. So how can we investigate the language of other social groups, or language use in informal contexts?
One possible answer is by investigating reported speech in fiction. Unlike scribes and the authors of texts, characters in fiction may come from a wide range of social groups and fiction may describe everyday interactions, providing us with data to investigate.
Obviously we can’t assume that the language used by characters in fiction was identical to the language of similar real people in society at the time—authors will undoubtedly have been best at representing their own language and the language of the social groups with whom they normally interacted. However, represented speech in fiction is often used as an expressive tool to represent the very social phenomena that we’re interested in, which encouragingly suggests that we should find interesting variation to research (Kiełkiewicz-Janowiak 1999:59; Culpeper 2009:81, 307). Better still, parallel research on language in modern fiction does suggest that language use by characters from particular social groups can reflect the language used by those social groups in reality. Work on language use by male and female characters in Japanese sitcoms has found that language use by female characters has many of the same features which typify spontaneous speech by female speakers. Features which people are quite well aware of and make use of for stereotyping can be even more pronounced in the fictional speech than in real speech (Shibamoto 1987:48; Shibamoto Smith 2004:126). Similar findings have been reported for male speech (Occhi, SturtzSreetharan & Shibamoto Smith 2010) and for Japanese novels rather than sitcoms (Shibamoto Smith 2004).
So if this works for modern languages, we should also be able to do it for historical languages, right? And some researchers have done just this. Research on Latin texts seems to show that male and female characters make slightly different choices of words (Adams 1984). Work on Classical Greek drama has shown differences between the speech of male and female characters in terms of choices of words (Bain 1984; Sommerstein 1995), choice of conversation topics, rhetorical structures (Mossman 2001), and choices of pronouns (Meluzzi 2010). Willi’s work on differences in grammar and choice of words in the speech of female and male characters in Classical Greek comedy goes further still, showing that what differences there are were understood in similar ways to gendered differences in modern languages: female characters used more politeness features and more innovative features (Willi 2003:176–195), and speech had more such characteristics in single-gender groups than in mixed groups (Meluzzi 2010:96–98; Willi 2003:196).
This stuff is really exciting. Classical Greece and Rome were incredibly sexist societies: very few women learned to read and write and vanishingly little written material by women survives. So, language use by fictional characters may be our only possible window on the language of Greek and Roman women in this period.
In my own work, I’ve tried to go one step further. The studies cited above all looked language in fiction in just one time period. Studying language in Old Icelandic fiction, I’ve taken represented speech from texts spanning almost three centuries to try and find out whether the way that female and male characters were involved in changing language over time is similar to the way we know that people of different genders are involved in language change in modern societies. As I mentioned in an older blog post, a common pattern in modern societies is that women are found to lead in language change, using more of a newer form earlier than men. And the results of my study do seem to show a similar pattern for one change which was taking place in Old Icelandic. As an older form is replaced by a newer one over several centuries, the represented language of female characters seems to stay about 15-20% ahead of the language of male characters.
Unlike with modern language studies, we’ve no way of then going and confirming that this language use in fiction really did reflect the use by real people. Nevertheless, it’s exciting to get a hint of patterns like this which would otherwise be lost! If you’re interested to read more about my study, you can find the paper on my Academia.edu page.
In this post I will discuss one of the more important (and, in some quarters, more controversial) ideas in modern theories of grammar. This is that there are some elements (“words”, if you like) which are present syntactically but phonologically have no form – which means they are there but we can’t see or hear them.
There are all sorts of examples of this. Take, for one, the following sentence:
Which kitten did Lucy buy?
Now, which kitten here is semantically the object of buy – compare the sentence Lucy did buy a kitten. That second sentence is an example of the fact that in English objects usually come after verbs. But this doesn’t seem to be the case in Which kitten did Lucy buy? For this reason and others linguists have suggested that which kitten starts off after the verb and is “copied” to the front of the sentence: but only the first copy you reach is pronounced. So the sentence really looks something like this:
Which kitten did Lucy buy
Another example is not from English. Lots of languages allow pronouns like I and they to be left unspoken in some contexts. Spanish is a good example:
Vivo en Cambridge.
live-I in Cambridge
There is no pronoun I here (the same meaning is conveyed through the suffix -o on the verb). But there are good reasons for believing all sentences need to have a subject, even in a language like Spanish. So it’s suggested that there is a pronoun there, in the normal place, it just has a “zero phonetic realisation” – it doesn’t contain any sounds that need to be pronounced, so you can’t hear it.
There are examples a bit like this in English to. Here’s another sentence:
It is important to feed yourself.
yourself is a sort of word called a “reflexive pronoun”, which basically means it needs to refer back to something earlier in the sentence. Hence in a sentence like You like yourself, yourself refers back to you. But in It is important to feed yourself there’s nothing for yourself to refer back to. Therefore, we can postulate a silent pronoun, which is also useful as it gives us a subject for to feed, a bit like the following:
It is important
you to feed yourself.
(Note that, in this instance, the sentence would be ungrammatical if the pronoun was pronounced.)
As a final, relatively easy example, compare the following two sentences:
Harry said that he would freeze the fish-fingers.
Harry said he would freeze the fish-fingers.
Spot the difference? The two sentences are identical except that one has that in and one doesn’t. One way of looking at this is to claim that the second sentence does contain an element equivalent to that, it just happens to be silent.
So we can conclude from this and many more examples that not everything in a sentence is necessarily pronounced, and that we can learn a lot about language by looking beyond what we hear to things that we don’t.
Tis the season to be merrily playing board games! Recently we were given a rather good new one by some friends, called Hanabi, and all our guests and family members have been subjected to it. It’s definitely a game for a pragmatician like me.
My interest was first whetted when I was told, upon presentation, that it is a co-operative game. Now, regular readers of this blog might remember previous posts on Pragmatics, introducing a chap called Paul Grice, a British philosopher and linguist whose thinking is foundational for much present-day pragmatics. His big thing was The Co-operative Principle: “Make your contribution [to the conversation or exchange] such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” Now, you might think that such a statement itself is a bit obtuse and not very, well, co-operative. But it’s basically saying, say the right thing at the right time in the right way.
We do this all the time when we communicate. When I woke up the other morning and exclaimed to my husband ‘the bin!’, he had no trouble inferring that I meant something like, “help! it’s a Thursday – we must get the black wheely bin out at once to avoid having smelly rubbish on our hands for the next fortnight!” But, in another context, that might not have worked, or it might have meant something entirely different. Most of the time we know how much information we need to convey to our conversation partner.
But what happens when there are some extra constraints on our communication? That’s where the fun of Hanabi starts. All the players have to work together against the game, to construct a wonderful fireworks display, being “absent-minded firework manufacturers who accidentally mixed up powders, fuses and rockets, [with] the show about to start and panic setting in” (though feel free to ignore the back story). Everyone holds their cards facing outwards, so I can see everyone else’s cards, containing elements of the fireworks display, but not mine. Players have to communicate pieces of information to other players, so that they know which of their own cards to play or discard. ‘Easy!’, you might be thinking. But just wait a second. In any one turn you can only communicate one piece of information about the number or the colour of the cards in one other person’s hand. So for example, if I’m looking at your cards (below), I might say “you have two fours”, or, alternatively, “you have one red”, and point those cards out.
So, as the speaker, I have to decide not only what the most useful thing for another player to know is, but also how I can communicate that to them. I have to take into account what we both know about the game so far, what the most salient aspects of the game currently are, what information other people have already communicated and how. In other words, I have to ‘do pragmatics’, obeying the Co-operative Principle – what we all do all the time when we’re chatting, or writing a blog post. But the difference here, playing Hanabi, is that, like the cards, it’s all on the table. The reasoning I’m doing about what I want to implicate and what inferences my chosen other player will make is conscious (and sometimes somewhat tortuous). Usually making a co-operative contribution to some conversation requires complex but pretty subconscious reasoning; in Hanabi, the interesting twist is that both speaker and hearer are paying conscious attention to it. And I wonder what difference that makes?
From my own anecdotal experience, it can be extremely difficult, as the hearer – the player given information to work out what someone else intends me to infer – precisely because I’m giving equal consideration to the numerous options: ‘now, has he told me that’s a red, because he wants me to play it now, or play it later, or it’s no longer useful and can be discarded, or…?’ That’s the kind of quandary we usually only find ourselves in in situations of miscommunication, when it’s really unclear what someone meant. (Another domestic example: Me – have you washed the pots? Husband – yes. Me – but they’re still muddy. Him – oh, I thought you meant pans, not potatoes!).
Just in case you’re wondering whether this is a nice ol’ ramble, but a bit far removed from any serious linguistic content: a couple of pragmaticians1 did actually conduct a study not a million miles removed from Hanabi, in which participants had to communicate which object an interlocutor should pick up using only colour or shape, to find out whether speakers can refer to objects optimally, and hearers can interpret as the speaker intended. (Answer: yes, but it’s complicated…)
My experience of Hanabi, though, makes me wonder how much people’s communicative behaviour changes when they’re placed in such a peculiar game situation, with conscious attention paid to communicating co-operatively; how much does it tell us about everyday linguistic reasoning?
But for now, it’s back to some more playing at pragmatics.
1 Ciyang Qing and Michael Franke (2015). Variations on a Bayesian Theme: Comparing Bayesian Models of Referential Reasoning. In: Bayesian Natural Language Semantics and Pragmatics, Ed. by Hans-Christian Schmitz and Henk Zeevat, Heidelberg, Springer
It is that time of the year again: an overdose of Wham! on the radio, the annual parade of cheesy jumpers, an increased interest in “working from home” (somehow clustering around days after office Christmas parties), and, for anyone taking a beginners’ language class, the time to learn seasonal greetings in the target language. For my Japanese class, this turned out to be a bit of a no-brainer: Merry Christmas is rendered into Japanese as メリー クリスマス, or Meri Kurisumasu. With the linguistic aspect of our final class memorized in seconds, everyone could happily focus on creating appropriately kawaii origami Christmas cards.
Christmas has been celebrated in Japan only for the past two decades or so, being largely a cultural import from the USA (with uniquely Japanese features – Japanese Christmas is definitely not a religious festival and has more of a Valentine’s day touch to it). As such, as with so many borrowed things, it is not surprising that the word for Christmas, along with the whole greeting, is also borrowed. Loanwords such as these are used directly from another language with little or no translation, and are a handy thing to have around when in search for le mot juste; how would you express the concept of Schadenfreude without a little help from German (or, for that matter, any of the concepts on this list of words English has borrowed)?
How many loanwords are adopted into a language and what happens to them depends on many non-linguistic factors. The Académie française, the French council for matters pertaining to the French language, is notorious for its abhorrence of the likes of le weekend, while the Finnish equivalent has a history of failed attempts of introducing Finnish alternatives to incoming loanwords: sohva ‘sofa’, for instance, was rendered joukkoistuin, literally translated as ‘group seater’.
Whatever the perceptions of the acceptability of loanwords, borrowings often undergo processes of change to fit in more comfortably with the phonology of the borrowing language. Hence loanwords, even if superficially very similar to the original, are rarely pronounced in exactly in the same way as in their language of origin: don’t expect your English version of French bric-a-brac or German Doppelgänger to make you sound like a native of either language. Interestingly for the linguist (and all the readers of CamLangSci!), the sorts of processes loanwords undergo can inform us about the phonological structure of the borrowing language.
What about Meri Kurisumasu? It turns out that this easy-to-learn Japanese-for-dummies greeting from my Japanese class is a linguistic gold mine of Japanese sound structure:
Given these basics of the Japanese sound system, why not convert some of that excess mincepie energy into brain power and figure out what abunoomaru, ueetoresu, salaliiman (all from English), and alubaito (from German) mean in Japanese? (And if that has left you under the impression that English is more of a lender than a borrower, this serves to show that it is no way a one-way relationship between Japanese and English.)
With that, I wish a Meri Kurisumasu (whatever form this may take for you) from all of CamLangSci to all of you!
Earlier this year I was reading Henry Fielding’s 1751 novel Amelia. A matter of linguistic interest that struck me was the frequent use of the phrase you was, nowadays stigmatised as pretty firmly non-standard English, and certainly not something you particularly expect to find used by posh characters in a classic novel. But there it was, repeatedly, for example:
You might wonder if you was was just something everybody used in the past, and the modern you were is a more recent innovation. But in fact you were is definitely the older term. The paradigm of to be in the past tense in English in about the sixteenth century was something like the following:
|I was||we were|
|thou wast||ye were, you were|
|he/she/it was||they were|
were is used with the plural forms and was(t) with the singulars. But also about this time a change was underway in the second person which saw the old present singular thou replaced with the plural you, giving us the modern paradigm as follows:
|I was||we were|
|you were||you were|
|he/she/it was||they were|
So what’s going on in Amelia? Here, we see almost exclusive use of you was: 33 instances, against only 1 of you were. (All of these are singular or ambiguous as to number; that is to say, there are no clear instances of plural you with a past tense of be.)
This looks like a case of a historical process called analogy. By this point, you has taken over as pretty much the sole second-person pronoun, replacing thou in the singular. (There are no instances of thou wast, and across the whole text the older forms thou, thee and ye are very much less frequent than you, which makes up about 98% of uses of second-person pronouns.) But this creates a disuninformity in the paradigm, one which is still present in standard English today: were is no longer an exclusively singular form. Some speakers in Fielding’s time clearly decided to get around this by extending was to (singular) you:
|I was||we were|
|you was||you were|
|he/she/it was||they were|
This paradigm makes was the sole form in the singular, and reinstates were as the only form in the plural. (Of course we can’t tell on the basis of the Amelia data alone if you were was retained in the plural, as we don’t definitively have any relevant uses of plural you, but there is evidence from other texts from the same time that this was the case.)
But what’s also interesting is that this change didn’t persist. At some point the trend toward you was was reversed, and standard English went back to you were. This illustrates that changes in language don’t proceed inexorably toward some end goal: they can be, and sometimes are, halted midway. It has been suggested that, in this particular case, the change may have been reversed through the influence of an important 1769 grammar by Robert Lowth, which condemned the use of you was. In general, perhaps, prescriptive attitudes don’t have that much of an influence in terms of preventing changes in the long-term, but maybe in this rare instance we should give prescriptivists reason to take heart—perhaps their efforts aren’t utterly futile after all!
This post is about person in grammar, a notion that we split up into (at least) three categories called “first”, “second” and “third” person. In English, there are personal pronouns corresponding to each of these categories, like I, you, or she.
There are many things to be said about pronouns and person, but I’ll focus on one that I find particularly interesting and that figures prominently in my dissertation: the degree to which the person of a subject and an object can influence the form of a predicate or the form of the subject or the object itself in different languages (see also Jim Baker’s excellent related posts here and here).
While English has person and different pronouns, its verbal morphology is not very interesting, so I will start with Hungarian. We’ll see some slightly complicated verb forms first, and when you’re all confused, I’ll tell you a about a beautifully simple way of how person influences verb forms in Hungarian.
Consider the examples in (1). The verb form with a first person subject in (1a) differs from the form with a third person subject in (1b) (in both Hungarian and English). This is called “subject agreement”, since the verb “agrees” with the subject. (A word about the examples: the first line shows the example in the language we’re talking about, the second line provides some grammatical information and English translations. “1SG” means “first person singular”, for example. The third line provides a full translation.)
(1) Hungarian a. Én lát-ok. I see-1SG ‘I see.’ b. Ő lát. s/he see.3SG ‘S/he sees.’
But Hungarian verbs do not only indicate the person of the subject, they can also indicate the person of the object. We can see this if we add a definite object, a third person pronoun in this case, to the sentences above. Now “1SG>3” means “first person singular subject and third person object”. Cool, right? (Note also the case ending on the object: “ACC” for “accusative”.)
(2) Hungarian a. Én lát-om ő-t. I see-1SG>3 s/he-ACC ‘I see him/her.’ b. Ő lát-ja ő-t. s/he see-3SG>3 s/he-ACC ‘S/he sees him/her.’
If we look at the English verbs, we see that their forms differ based on whether the subject is first or third person, but it doesn’t make a difference whether they have an object (say a pronoun like her) or not. In other words, for each (subject) person, there is only one form in English per tense. In Hungarian, there are several forms: a verb can agree with the subject only, as in (1), or with the subject and the object, as in (2). To make things even more fun for learners and linguists alike, this only happens with some objects, though.
In another interesting spin, it depends on the person of both the subject and the object whether both are indicated on the verb. If the person of the subject is first person and the object is third person, like above, the verb seems to indicate both (in other words, the verb shows subject and object agreement; I’ll indicate this as “1>3”).
What happens in other persons? When the subject is third person and the object is first person (3>1), does the verb also show subject and object agreement? It does not!
(3) Hungarian Ő lát engem. s/he see.3SG me ‘S/he sees me.’
If we look at the verb form in the last sentence and the one in (1), they are the same: for a verb with a third person subject, it does not matter whether the verb has an object or not, it shows the same form lát meaning ‘s/he sees’.
Confusing, right? One more thing on Hungarian, though: the best way to show the sensitivity to the person of both the subject and the object is with second person objects:
(4) Hungarian a. Én lát-lak téged. I see-1SG>2 you ‘I see you.’ b. Ő lát téged. s/he see.3SG you ‘S/he sees you.’
In (4b), the verb only shows the person of the subject (as with in (3)), but in (4a), the verb shows the person of the subject and of the object: -lak only appears with first person subjects and second person objects. The two sentences in (4) have the same object, but whether the verb shows object agreement depends on the person of the subject!
OK, so what’s the beautiful pattern behind all this? Consider this so-called “hierarchy”:
(5) 1 > 2 > 3
To decide whether a Hungarian verb shows agreement with both the subject and the object, we have to look at whether the person of the subject is higher than the person of the object. If this is the case, say with a first person subject and a second person object, or 1>2, we see agreement. This kind of configuration is called “direct”.
But if the object’s person is higher, say a third person subject and a first person object, or 3>1, there is no agreement in Hungarian. Such configurations are called “inverse”. (This is not quite the whole story for Hungarian, but it’s the general pattern. There are some references below if you’re really interested).
So far, this was about agreement, i.e. the form of the verb. However, the same hierarchy also influences the form of the subject or the object in some languages. In Kashmiri, for example, the case of the direct object depends on the person of both the subject and the object. If the object’s person is higher than the subject’s person, the object appears in object case (like him or her in English, as opposed to he and she). Compare the following examples:
(6) Kashmiri a. bı chusath tsı parınaːvaːn I.SUBJ am you.SUBJ teaching ‘I am teaching you.’ b. tsı chukh me parınaːvaːn you.SUBJ are I.OBJ teaching “You are teaching me.”
In the first one, the subject’s person is higher than the object’s: 1>2, a direct configuration. Therefore the object is in its subject form, i.e. the same as in example (6b). In that example, the person of the subject (2) is lower than the person of the object (1), and therefore the object has object case, i.e. me meaning, well, ‘me’ (as opposed to bı in (6a) meaning ‘I’).
To show that the same thing holds for second person, let’s see the following examples. First, the subject’s person (2nd) is higher than the object’s (3), and therefore the object has subject case (compare su in both sentences!). In the second example, however, the object’s person is higher and therefore shows up in object case (tse rather than tsi).
(7) Kashmiri a. tsı chihan su parınaːvaːn you.SUBJ are he.SUBJ teaching ‘You are teaching him.’ (literally something like ‘you are teaching he’) b. su chuy tse parınaːvaːn he.SUBJ is you.OBJ teaching ‘He is teaching you.’
Again, the hierarchy in (5), 1 > 2 > 3, gives us a way to describe what’s going on: only if the object’s person is higher than the subject’s does the object show case-marking. In other words, the object shows case-marking in inverse configurations.
There are many other examples of similar patterns across the world: some Native American languages have both case-marking (like Kashmiri) and verb forms (a bit like Hungarian, but more complex) that differ depending on the person of the subject and the object.
To give a final example, the language Awtuw, spoken in Papua New Guinea, requires that some objects appear in object case, but this does not just depend on person, but also on whether the object is more “animate” than the subject. And you need to know that humans count as more animate than animals, in this language.
(8) Awtuw a. Tey tale-re yaw dæli the woman-OBJ pig bit ‘The pig bit the woman.’ b. Tey tale yaw dæli the woman pig bit ‘The woman bit the pig.’
According to Feldman’s grammar of Awtuw, the more animate, human argument (the woman) can only be the object if it is specially marked by the suffix -re. Rather than looking at person directly, for Awtuw we seem to have a hierarchy that indicates
(9) human > animate
Is there a way to combine humanness or animacy and person? Many linguists think so! They suggest that hierarchies are quite large, like the one in (10), and that they incorporate both person and humanness.
(10) 1 > 2 > 3 > human > animate > inanimate
Languages differ in how they lump several levels together: in Hungarian, humanness or animacy do not play a role in determining the form of the verb, for example. In Awtuw, on the other hand, they do in determining the form of the object. And obviously, many languages do not show these effects at all.
Languages obviously differ in whether such hierarchies influence agreement or case morphology, both, or neither, but there are nevertheless some very interesting generalisations that seem to hold across languages. “Special” marking like object case in Kashmiri or Awtuw tends to appear when the object’s person (or animacy) is higher than the subject’s but not when the subject’s person or animacy is higher than the object’s. It seems that direct configurations are “the norm”, while inverse configurations are “special”.
Why should this happen so regularly?
Some linguists suggest that the most typical kinds of subjects in transitive clauses tend to be high on the hierarchy, while objects tend to be low and therefore those constructions are expressed in a special way that diverge from this norm.
Another way to describe hierarchies is to assume that “1” and “2” represent more complex notions: 1 stands for the features “speaker, participant, person”, whereas 2 stands for “participant, person”, and 3 merely for “person”. This way of defining “person” makes first person the most specific and third person the least specific. First person always includes the speaker, but the reference of third person is much, much less restricted, and this might be a way of capturing this specificity in reference — and the fact that first and second person tend to behave in more “special” ways than third person.
To sum up, person, as inconspicuous as it is in English grammar, does fascinating things in languages all over the world, leading to case-marking here and agreement there — or in fact making certain sentences impossible. Jelinek and Carnie (2003) report of the Native American language Lummi that it is not possible to say “he advised us” (with a first person object):
“Speakers produce the example sentences comfortably until they are asked to say ‘He advised us’. Then they stop, look surprised and uneasy, and then if they are good consultants, after a while may say something like ‘Well, we don’t say it that way. You might say ‘We were advised’, but it’s not really the same, is it?’”
I decided to keep references out of the text to make it more legible. Here they come.
If you’re interested in literature about Hungarian agreement, here are some recent papers that include further references:
Bárány, András (2015). ‘Inverse agreement and Hungarian verb paradigms’. In: Approaches to Hungarian: Volume 14, Papers from the 2013 Piliscsaba Conference. Ed. by Katalin É. Kiss, Balázs Surányi and Éva Dékány. Amsterdam/Philadelphia: John Benjamins, 37–64. doi: 10.1075/atoh.14.02bar. Coppock, Elizabeth (2013). ‘A semantic solution to the problem of Hungarian object agreement’. Natural Language Semantics 21.4, 345–371. doi: 10.1007/s11050-013-9096-7. Coppock, Elizabeth and Stephen Wechsler (2012). ‘The objective conjugation in Hungarian: Agreement without phi-features’. Natural Language & Linguistic Theory 30.3, 699–740. doi: 10.1007/s11049-012-9165-5. É. Kiss, Katalin (2013). ‘The Inverse Agreement Constraint in Uralic Languages’. Finno-Ugric Languages and Linguistics 2.3, 2–21.
For Kashmiri, Awtuw and Lummi, see the following references. Examples (6) and (7) are from Wali and Koul (1997: 155); example (8) is from Feldman (1986: 110). The quote about Lummi is from Jelinek and Carnie (2003: 285).
Feldman, Harry (1986). A Grammar of Awtuw. Canberra: The Australian National University. Jelinek, Eloise and Andrew Carnie (2003). ‘Argument hierarchies and the mapping principle’. In: Formal Approaches to Function in Grammar: In honor of Eloise Jelinek. Ed. by Andrew Carnie, Heidi Harley and MaryAnn Willie. Amsterdam/Philadelphia: John Benjamins, 265–296. doi: 10.1075/la.62.20jel. Wali, Kashi and Omkar N. Koul (1997). Kashmiri: A cognitive-descriptive grammar. New York: Routledge.
Finally, there is some more about hierarchies and “typical” subject-object configurations here, …
Aissen, Judith (2003). ‘Differential object marking: iconicity vs. economy’. Natural Language & Linguistic Theory 21.3, 435–483. doi: 10.1023/A:1024109008573. Comrie, Bernard (1989). Language Universals and Linguistic Typology. 2nd edition. Chicago: University of Chicago Press. Keine, Stefan (2010). Case and Agreement from Fringe to Core: A Minimalist Approach. Berlin/New York: De Gruyter. Richards, Marc (2008). ‘Defective Agree, Case Alternations, and the Prominence of Person’. In: Scales. Ed. by Marc Richards and Andrej L. Malchukov. Linguistische Arbeitsberichte 86. Universität Leipzig, 137-161.
… while the following paper is about treating person as more complex categories.
Harley, Heidi and Elizabeth Ritter (2002). ‘Person and Number in Pronouns: A Feature-Geometric Analysis’. Language 78.3, 482–526. doi: 10.1353/lan.2002.0158.
On the 13th of November UCL’s Deafness and Cognition Language Research Centre (DCAL) celebrated its 10th anniversary. In ten years DCAL has had a profound effect on a number of areas, from Clinical Psychology to Education. One of the most exciting projects from a linguist’s perspective is probably their British Sign Language (BSL) Corpus Project. Before 2008 there was no large accessible collection of BSL signing. DCAL decided to address this gap and set out to collect signing data from Deaf participants from different areas of the UK. Ultimately signing data was collected from 249 Deaf people in 8 cities (London, Bristol, Birmingham Manchester, Newcastle, Glasgow, Cardiff and Belfast). Within these signers there were also different genders, ages, ethnic groups and occupations represented. Participants were interviewed, held conversations with other signers and were asked to provide their preferred sign for 102 different concepts (e.g. ‘America’ or ‘dog’). This gave DCAL a wealth of signing data unlike anything ever collected on BSL before.
So, why is this important?
Traditionally, we’ve found out about variation in how people speak—whether that be variation between people in different places, of different classes, genders, or whatever—by doing surveys. Dialectologists have travelled around the country interviewing a few people in each town to record how each would say a set of words. Sociolinguists have interviewed wide ranges of people from different educational and social backgrounds and looked for differences in how they speak. These sorts of methods have been very successful—but they’re also very costly. Sending out researchers to do dialectological surveys is an expensive business: many researchers are needed to carry out the long process of getting to know local people and finding some who are willing to be interviewed in every locality and all those researchers have to be paid for their time and travel. The reality is, there just hasn’t been the funding in humanities and social science research to do this sort of work on a large scale for some years and so much of our data is rather out of date.
But in the era of the internet and ‘Big Data’ there’s a new way of finding out about language variation: using social media. And so a new generation of research into language variation using language data from social media is just starting to appear.
Using social media data for research is a very different proposition to traditional survey data. Obviously, it’s mostly written rather than spoken data, which immediately puts some limits on the sorts of things it can tell us. More problematically, you can rarely find out as much information about each person in your study as in a traditional survey, and even what information you can find out is unreliable. As an interviewer in person the researcher can ask for more information when needed: ‘You say you’re from York—were you born and brought up there, or did you move around as a child? Were your parents also from York?’ But dealing with online data, the vast majority of the time what you see is all you get. You know what the user chose to write in the ‘Hometown’ box but not necessarily what they meant by it. You know where their phone was when they tweeted—but you don’t know if that’s the place that they live and were brought up, or indeed whether those are the same places.
Nevertheless, there is one big advantage to this sort of data: there’s lots of it. And a big enough quantity of data can often make up for low quality data, if we’re asking the right questions. Because of the uncertainties about who’s really behind the keyboard, we can rarely use social media to make definitive statements about how much a given group of people speaks or writes in a certain way (that would be statements like ‘people under 25 from London use the word order “give it me” 50% of the time and “give me it” 50% of the time’)—but we can make comparative statements (like ‘people from London use the word order “give it me” twice as often as people from Lincolnshire’).
To exemplify what sort of work is being done with social media at the moment, I’ll take you through a couple of interesting recent papers (links to both are found at the bottom of the post). Gonçalves & Sánchez (2014) gathered around 50,000,000 tweets written in Spanish and associated with a GPS location over two years. They then tracked lexical variation—variation in the words people choose to use to describe a given concept—to see if they could find differences in people’s language use associated with different places. The map below is reproduced from their paper, showing the different words used for ‘car’. As you can see, five distinct areas emerge: people in North America and northern South America largely use ‘carro’; people in Central America and in Spain usually use ‘coche’; and people in the southern half of South America generally use ‘auto’.
They then took results like this for many words and used machine learning algorithms (specifically K-means clustering) to investigate whether there were identifiable groups of dialects. The result was very surprising. Instead of showing big, regional dialects associated with contiguous areas on the map, the algorithm identified just two dialects: one associated with the big urban areas and one with everywhere else. Gonçalves & Sánchez write: “Superdialect α is utilized by speakers in main American and Spanish cities and corresponds to an international variety with a strongly urban component while superdialect β is comprised mostly of rural areas and small towns” (6). They see this as evidence for the homogenising effect of globalisation on language.
Eisenstein et al. (2014) focused not on the static facts of whole dialects but on fast-paced processes of change associated with new words entering the language. They collected a corpus of 107,000,000 tweets in English from 2009-2012 and looked only at words whose frequencies changed significantly over time. Below is an example, reproduced from their paper. It shows the expansion of the term ion (short for ‘I don’t’ as in ‘ion even care’) over a 150 week period.
One interesting finding which is immediately clear from such figures is that even for these sorts of words which are fundamentally written and exist (basically) only online, geography is relevant. On the face of it, we might expect words on the internet to spread randomly across space, as most of what is posted is publicly visible regardless of where you are. But the reality is that words basically spread through social networks, and these exist in real space, even if we’re watching them in action online.
Eisenstein et al. go on to examine the most common routes of linguistic diffusion, mapping the paths most often taken by new words between the cities, and then investigate what factors favour such linguistic pathways. They found that racial demographics were crucially important: linguistic differences were more likely to be transmitted between cities with similar proportions of African American citizens and Hispanic citizens. Small geographic distance and similar proportion residents of urbanised areas and median income also facilitated linguistic influence. Population also had an effect: larger settlements were more likely to exert influence than be subject to it.
These two studies are just a small intimation of the potential for linguistic research with social media, but hopefully you can start to see what an exciting area this promises to be!
Eisenstein, Jacob, Brendan O’Connor, Noah A. Smith & Eric P. Xing. 2014. Diffusion of Lexical Change in Social Media. PLoS ONE 9(11). e113114. doi:doi:10.1371/journal.pone.
Gonçalves, Bruno & David Sánchez. 2014. Crowdsourcing Dialect Characterization through Twitter. PLoS ONE 9(11). e112074. doi:10.1371/journal.pone.0112074.
As a linguist, I often fail to match my non-linguist friends in how-cool-is-my-degree anecdotes: Japanese word order alternations just don’t have the same shock effect as a budding doctor declaring how formaldehyde makes them hungry during dissections, and Middle English sound changes don’t make you quite as hip as the classicist divulging in ancient Bacchanalia. But a few weeks back, I enjoyed a rare moment of subject coolness when I declared (to the hip classicist, as it happens) that – brace for impact – Finnish only has one word, hän, for both he and she. Nor is it the only one of its kind.
For a moment, I felt that the ensuing silence, followed by somewhat excessive OMG-nowaying (at which point I was seriously considering offering the poor classicist a paper bag to prevent hyperventilation) was perhaps veering into the overreacting side of things. However, a gender-neutral third-person pronoun has glimmered as the Holy Grail of linguistic equality in the minds of generations of activists and regularly crops up in newspaper headlines – not an insignificant subject to get excited about. English, for instance, has seen suggestions ranging from hu and peh to xe, jee and many more, as alternatives for unifying he and she.
Why bother about such minuscule issues? Haven’t English-speakers been quite content with the distinction since, well, English began? Part of the answer is stylistic: everyone knows the awkwardness of conscientiously typing he/she whenever reference is not specified, turning the prose of budding Shakespeares into a satire of political correctness, or the cumbersome singular they shot down by prescriptivists. The other part, as any self-respecting feminist in the footsteps of Beauvoir will point out, is that language is a tool of, mostly patriarchal, power – more often than not, masculine terms carry the connotations of the standard and the positive. The list is continued by the question of how to refer to those who do not identify themselves in the gender binary, and cases where the use of he has turned androgynous entities into gendered beings (never imagined the Christian God as a bearded fellow on the rim of a cloud?).
It may all seem like a never-ending debate conducted from ivory towers but in 2012, Sweden saw gender-egalitarian fuel thrown into its pronominal flames as the first children’s book with hen, the proposed gender-neutral equivalent for feminine hon and masculine han, was published. In Kivi och Monsterhund, the protagonist Kivi has no specified gender and it is left to the reader to choose how they see, uhm, them. However, in the wonderland of equality where even the main airport features unisex toilets and where hen has finally achieved dictionary-status this year, the opposition has raised a surprisingly animated and even imaginative counter-attack, not least because of underlying seeds of misunderstanding: it is only the extremists who want to see the gender distinction fully erased from pronouns whereas the majority regard hen as an opportunity to avoid the awkwardness of referring problems. Sadly enough, it is the extremist view that has gained the most eager attention.
Some say hen confuses language use, and the major newspaper Dagens Nyheter banned its use for the same reason; others claim it confuses not only language but also their children’s gender identity. Those inclined to think in terms of conspiracy theories see it all as one big feminist plot to erase gender not only from language but humankind (NOT mankind, please) as well. And as hen has a meaning quite different in English, oh the irony of gender-neutral hens!
However, according to the author, Jesper Lundqvist, Kivi the gender-neutral protagonist is less of a feminist tour de force and more of a creative possibility. No proverbial governments have been taken over: children still assign gender, usually their own, when reciting the tale. And while Lundqvist admits to using increasing numbers of hen in casual speech, those still fearing the loss of their gender may rest assured for Kivi does have gender-specific parents – the good old mother and father.
Yet all is not well in allegedly gender-neutral language paradises, either, where, perhaps surprisingly, the opposite trend is emerging. Look east of Sweden, and Finnish language enthusiasts strike back. Finnish may well have only gender-neutral pronouns, hän for people and se for things as well as people, but there is a yearning, however faint and underground, for something gender-specific. The original translation of Joyce’s Ulysses, for instance, was criticized for using the single pronoun in situations where the original English she/he distinction clarifies reference, while the most recent version has been furnished with the additional, made-up hen for feminine reference – likewise the object of unhappy feelings aplenty.
What about greater gender equality propelled by hän? Would the introduction of a gender-specific pronoun not undermine the goals of gender-neutral language enthusiasts? In language, certainly no stylistic issues arise with hän when the referent could be of any, or no, gender, and in society, women were given the right to vote second in the world (which is slightly shadowed by the fact that gender-specific English-speaking New Zealand got there first). However, all these enthusisasts’ arguments reduce to nothing but wishful thinking in face of studies showing that people uniformly manifest predominantly male associations with hän – only girls under school age will see it as female. So much for gender-neutral socialization through a single pronoun.
Risking a blow to my subject coolness, I had to inform the classicist of a further anticlimax: what seems to be constantly forgotten in these debates, is that hen and its peer neologisms, whatever their social impact, are very unlikely to gain ground other than as stylistic alternatives for the already existing pronouns. Studies into the history of languages show repeatedly that the basic ingredients of language are the most resistant to change and as such cannot be changed by willpower. Just think about it: can you imagine yourself saying thon or ze instead of he and she without the ridicule carried by so many expressions created to be politically correct?
Deep breaths. English pronouns aren’t going gender-neutral any time soon.