<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Ideophone &#187; Visualization</title>
	<atom:link href="http://ideophone.org/topics/visualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://ideophone.org</link>
	<description>Sounding out ideas on African languages, vivid sensory words, and iconicity</description>
	<lastBuildDate>Thu, 26 Apr 2012 15:33:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Visualizations</title>
		<link>http://ideophone.org/visualizations/</link>
		<comments>http://ideophone.org/visualizations/#comments</comments>
		<pubDate>Fri, 07 Nov 2008 13:46:59 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://ideophone.org/?p=114</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Visualizations&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-11-07&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/visualizations/&amp;rft.language=English"></span>
Via Language Log, a nice tutorial titled Interactive Visualization for Computational Linguistics [PDF, 13,1 Mb] by Christopher Collins, Gerald Penn, and Sheelagh Carpendale. Includes not only lots of wonderful visualizations, but also a lot of background information on Gestalt perception, &#8230; <a href="http://ideophone.org/visualizations/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Visualizations&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-11-07&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/visualizations/&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://ideophone.org/?p=114"><!-- &nbsp; --></abbr>
<p>Via <a href="http://languagelog.ldc.upenn.edu/nll/?p=815">Language Log</a>, a nice tutorial titled <em>Interactive Visualization for Computational Linguistics</em> [<a href='http://www.cs.utoronto.ca/~ccollins/acl2008-vis.pdf'>PDF, 13,1 Mb</a>] by Christopher Collins, Gerald Penn, and Sheelagh Carpendale. Includes not only lots of wonderful visualizations, but also a lot of background information on Gestalt perception, visualizations as &#8216;external cognition&#8217;, preattentive processing, info on a case study (slide 196ff.), and ample examples of different kinds of visualization software. See also <a href='http://www.infovis-wiki.net/index.php?title=Linguistic_Visualization'>InfoVis:Wiki &mdash; Linguistic Visualization</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ideophone.org/visualizations/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wordle now does Extended Latin and diacritics</title>
		<link>http://ideophone.org/wordle-extended/</link>
		<comments>http://ideophone.org/wordle-extended/#comments</comments>
		<pubDate>Wed, 22 Oct 2008 15:15:41 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Siwu]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://ideophone.org/?p=108</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Wordle+now+does+Extended+Latin+and+diacritics&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-10-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/wordle-extended/&amp;rft.language=English"></span>
Great news for those who are into visual corpus linguistics but don&#8217;t work on SAE languages: since July, Wordle handles alphabets in the Extended Latin ranges; and today its maker, Jonathan Feinberg, added support for combining diacritics. That means that &#8230; <a href="http://ideophone.org/wordle-extended/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Wordle+now+does+Extended+Latin+and+diacritics&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-10-22&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/wordle-extended/&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://ideophone.org/?p=108"><!-- &nbsp; --></abbr>
<p>Great news for those who are into <a href="http://ideophone.org/visual-corpus-linguistics/">visual corpus linguistics</a> but don&#8217;t work on <abbr title="Standard Average European">SAE</abbr> languages: since July, <a href='http://wordle.net'>Wordle</a> handles alphabets in the Extended Latin ranges; and today its maker, Jonathan Feinberg, added support for combining diacritics. That means that you can now feed Wordle texts from languages that use tone marks and other diacritics in their orthographies. Like Siwu.</p>
<div class='img img-full'>
<img src="http://ideophone.org/files/wordle-siwu-1.png" alt="" title="wordle-siwu-1" /></p>
<div><a href='http://wordle.net'>Wordle</a> based on some ten minutes of spontaneous conversation in Siwu.</div>
</div>
<p>The Wordle above displays the most common words in some ten minutes of spontaneous conversation in Siwu, one of the fruits of my last fieldtrip. The conversation has four participants. Nothing groundbreaking about this particular Wordle, it&#8217;s just a nice word cloud starring: </p>
<ul>
<li><strong class='langdata'>kùɖu</strong> &#8216;gunpowder&#8217;, the main topic of the conversation since this was actually videotaped while the four participants were manufacturing a local type of gunpowder;</li>
<li><strong class='langdata'>sɔ</strong>, the all-purpose quotative/complementizer &mdash; (X) (says) <em>that</em> Y&#8230;;</li>
<li><strong class='langdata'>gɔ</strong>, the relative pronoun for the animate singular class, as in <em class=' langdata'>ɔ̀turi gɔ lokpi</em> &#8216;the man <em>who</em> died&#8217; &mdash; this indicates that a lot of the talk is about persons;</li>
<li><strong class='langdata'>mm</strong>, a backchannel cue signalling involvement and attention</li>
<li><strong class='langdata'>fɔ</strong>, an emphatic 2nd person pronoun;</li>
<li><strong class='langdata'>kɔ̃rɔ</strong> &#8216;right now&#8217;, a stoplap roughly used as English <em>now</em> in &#8216;now we went there and guess what happened&#8230;&#8217;</li>
</ul>
<p>Things this Wordle cannot show is the relative differences in conversational strategies of the participants. Mr. Orange, for example, as I call him in my ELAN transcript of the conversation, is by far the main supplier of <em>mm</em> and its cousin <em>m-hm</em>; in fact his repertoire is not much bigger than that &mdash; rather than doing the talking he prefers to have a supporting role in this particular conversation. </p>
<p>The three others are much more vocal and varied, not to mention much more expressive (ideophones are sprinkled all over the place!). But that&#8217;s all for another occassion. For now, cheers to the new Wordle!</p>
]]></content:encoded>
			<wfw:commentRss>http://ideophone.org/wordle-extended/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More visualizations</title>
		<link>http://ideophone.org/more-visualizations/</link>
		<comments>http://ideophone.org/more-visualizations/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 19:04:39 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://ideophone.org/?p=79</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=More+visualizations&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/more-visualizations/&amp;rft.language=English"></span>
A visualization of the previous two posts on Many Eyes and Siwu ne Because recursivity is a Good Thing, here is a visualization of the previous two posts on visualizing linguistic data with Many Eyes. The astute reader will note &#8230; <a href="http://ideophone.org/more-visualizations/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=More+visualizations&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/more-visualizations/&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://ideophone.org/?p=79"><!-- &nbsp; --></abbr>
<div class='img img-full'>
<img src="http://ideophone.org/files/visualization.png" alt="ne" title="ne" /></p>
<div>A visualization of the previous two posts on <em>Many Eyes</em> and Siwu <em>ne</em></div>
</div>
<p>Because recursivity is a Good Thing, here is a visualization of the previous <a href="http://ideophone.org/many-eyes-on-siwu-ne/">two</a> <a href="http://ideophone.org/visual-corpus-linguistics/">posts</a> on visualizing linguistic data with <em>Many Eyes</em>. The astute reader will note that the strange loop is not perfect since I didn&#8217;t use Many Eyes for the visualization &mdash; that is because nothing can do a simple visualization as beautifully as <a href='http://wordle.net'>Wordle</a>.</p>
<p>Unfortunately, Wordle doesn&#8217;t seem to handle Unicode outside the basic Latin range very well (probably because of the fancy fonts), otherwise I would&#8217;ve fed it some Siwu text, too. (I think Wordle could be made to work with SIL&#8217;s freely available <a href='http://www.sil.org/computing/catalog/show_software_catalog.asp?by=cat&#038;name=Font'>Unicode fonts</a>.)</p>
<h2>Queensland grammar scandal at a glance</h2>
<p>As an added bonus, here are the 75 most common content words from the recent discussion of the Queensland grammar scandal (sampled from <a href='http://languagelog.ldc.upenn.edu/nll/?p=239'>three</a> <a href='http://languagelog.ldc.upenn.edu/nll/?p=264'>verbose</a> <a href='http://languagelog.ldc.upenn.edu/nll/?p=269'>posts</a> at Language Log and from <a href='http://www.matjjin-nehen.com/2008/06/18/de-bellis-grammaticae/'>matjjin-nehen</a>, including comments). It won&#8217;t help the debate, but it does give you the brouhaha at a glance. Another different something function. Grammatical grammar. Australian grammar errors indeed.</p>
<div class='img img-full'>
<a href='http://wordle.net/gallery/wrdl/27809/Australian_grammar_errors'><img src="http://ideophone.org/files/brouhaha.png" alt="ne" title="ne" /></a></p>
<div>The 75 most common content words about the Queensland grammar brouhaha in the linguablogosphere</div>
</div>
<p>(Link to Wordle found in <a href='http://ynada.com'>Cornelis Puschmann</a>&#8216;s del.icio.us feed.)</p>
]]></content:encoded>
			<wfw:commentRss>http://ideophone.org/more-visualizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Many Eyes on Siwu ne</title>
		<link>http://ideophone.org/many-eyes-on-siwu-ne/</link>
		<comments>http://ideophone.org/many-eyes-on-siwu-ne/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 18:00:22 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Siwu]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://ideophone.org/?p=77</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Many+Eyes+on+Siwu+%3Cem%3Ene%3C%2Fem%3E&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/many-eyes-on-siwu-ne/&amp;rft.language=English"></span>
Lots of readers looked at the challenge I posted last week (my blog statistics say more than 450 views for the post alone, so that&#8217;s many eyes indeed). A few of you were even daring enough to come up with &#8230; <a href="http://ideophone.org/many-eyes-on-siwu-ne/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Many+Eyes+on+Siwu+%3Cem%3Ene%3C%2Fem%3E&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/many-eyes-on-siwu-ne/&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://ideophone.org/?p=77"><!-- &nbsp; --></abbr>
<p>Lots of readers looked at the <a href="http://ideophone.org/visual-corpus-linguistics/">challenge</a> I posted last week (my blog statistics say more than 450 views for the post alone, so that&#8217;s many eyes indeed). A few of you were even daring enough to come up with a story on the various functions of Siwu <em>ne</em>. The challenge was probably a bit too difficult (involving an untranslated text in an as of yet undescribed Niger-Congo language), which makes those few attempts all the more heroic. So what did they see?</p>
<div class='img img-full'>
<img src="http://ideophone.org/files/ne-right.png" alt="ne" title="ne" /></p>
<div><em>ne</em> and its right periphery; &#8220;ne, &#8230;&#8221; accounts for almost half of the tokens</div>
</div>
<p><a href="http://ideophone.org/visual-corpus-linguistics/#comment-532">Brett</a> was the first to bite the bullet, providing some statistics on the use of <em>ne</em>. He noted that &#8220;it occurs sentence initially 158 times (out of 1161) and sentence terminally 83 times. (&#8230;) It often seems to bracket a whole clause and it can even be doubled. The <em>ne kama ne</em> string is quite common.&#8221; <a href="http://ideophone.org/visual-corpus-linguistics/#comment-537">Ray Girvan</a> didn&#8217;t trust the visualization and inspected the raw text instead. He discovered that the text contains some dialogues &#8220;as well as as a complete song/poem with multiple uses of “ne” in a question&#8221;; that the construct <em>Si &#8230;. ne</em> occurs frequently in it; and that the text probably consisted of several different text types. On came <a href="http://ideophone.org/visual-corpus-linguistics/#comment-540">Jason</a> with a number of rather detailed observations: </p>
<blockquote>
<ul>
<li>“ne” is the most common word in the sample data. It is more common than the most common punctuation marks, “.” and “,”. It is the most common word following a period (”. ne” gives 158 hits; the runner-up “. si” has 115) and the most common word preceding a period (”ne .”: 84; “ame .”: 72).
</li>
<li>“ne” usually appears before or after a “.” or “,”:
<ul>
<li>– 42.9%: immediately before “,”</li>
<li>13.6%: sentence-initial (”ne” after “.” “?” or “!”)</li>
<li>7.8%: sentence-final (”ne” before “.” “?” or “!”)</li>
<li>4.1%: immediately after “,”</li>
<li>31.5%: other</li>
</ul>
</li>
<li>There are 9 hits for “ne , ne” and one for “ne ne”. (“Si maɔsɛgu ne ne, ɔrɔ̃go kpakpa gɔ mpia…”)</li>
<li>“ne kama ne” appears 31 times, almost always after punctuation.</li>
<li>The most common word after a sentence ending in “ne .” is “ne”.</li>
</ul>
</blockquote>
<p>Then there were a couple of anonymous <a href='http://services.alphaworks.ibm.com/manyeyes/view/St29TOsOtha6l5VicxxZO2~#t29TOsOtha6k5FVYxxZO2~'>comments</a> on the visualizations themselves, perhaps from the above gentlemen, perhaps from others. One contrasted the left and the right periphery of <em>ne</em>, noting that the first was &#8220;very very branchy&#8230; indicating perhaps (a) sentence-final &#8220;ne&#8221; attaches to the preceding clause or sentence, rather than the preceding word or phrase; (b) all these words belong to some small number of open classes&#8211; they&#8217;re all nouns or adjectives, for example.&#8221; By comparison, this commenter noted, the visualization of sentence-initial &#8220;ne&#8221; is not nearly as branchy.</p>
<h2>So what does <em>ne</em> do?</h2>
<p>So much for the frequency-based observations. Based on this, what can we say about the possible functions of <em>ne</em>? Brett&#8217;s guess was that &#8220;it has a grammatical function. Perhaps a negation (though I may be influenced by the spelling) or a question marker.&#8221; Ray didn&#8217;t place a bet and stuck to his observations. Jason said: &#8220;I’m guessing it performs some function that is not served by any English word. Maybe it indicates past tense. (&#8230;) For a while I was thinking preposition, but it seems really inconvenient for a language to prefer that prepositions appear at the extremes of sentences.&#8221;, and came back later to add &#8220;Maybe it’s a topic marker.&#8221;</p>
<p>Let me first disclose my own story before considering the above hypotheses. <em>Ne</em> is extremely common sentence- and clause-finally. <em class='highlight'>In this context, its main function is to mark stuff as the topic on which a comment will follow.</em> That is, it is a discourse particle used by speakers to signal &#8216;this was background information, more will come&#8217;. Because of this, it also functions as a floorholder, signalling that this was not yet the end of the interactional turn (even though the sentence may have been grammatically complete). Here is a glossed and translated example of clause-final <em>ne</em>:</p>
<dl class='interlinear'>
<dt class='langdata'>Si ma-ɔ-ro ɔturi ɔkpa <em class='highlight'>ne</em>,</dt>
<dd class='gloss'>If they-PERF-finish person parting <em class='highlight'>TOPIC</em></dd>
<dt class='langdata'>ka mà-a-sɛ mà-a-pie wũ ndu</dt>
<dd class='gloss'>IMM they-FUT-go they-FUT-wash him water</dd>
<dd>When they have finished taking leave of the person [who died, MD], they will go and wash him</dd>
</dl>
<p>The <em>Si &#8230; ne</em> construction noted by Ray also is a part of this picture. You didn&#8217;t know of course, but <em>si</em> means &#8216;if&#8217;, and therefore <em>si &#8230; ne</em> sentences are roughly equivalent to if-then clauses. Note however that Siwu <em>ne</em> does not correspond to &#8216;then&#8217; but rather (I would say) to the rising intonation of the &#8216;if&#8217; clause in English &mdash; it backgrounds the condition and announces the &#8216;then&#8217; clause.</p>
<p>The anonymous commenter&#8217;s note that sentence-final <em>ne</em> is &#8216;very branchy&#8217; nicely accords with this view of <em>ne</em> as a discourse marker. Basically, you can tack it on any type of sentence; there are no grammatical restrictions. This makes Jason&#8217;s preposition hypothesis less likely (prepositions usually occur in well-defined contexts). His other proposal that <em>ne</em> &#8216;performs some function not served by any English word&#8217; is not without problems; we do have topicalizing and floor-holding devices in English of course (politicians cannot utter a sentence without them). On the other hand, one could argue that these devices (front-shifting, &#8216;<em>uhm</em>&#8216;, intonation, etc.) are not nearly as wordlike as Siwu <em>ne</em> (as shown for example by the fact that the former do not show up as regularly in written texts as the latter). </p>
<p>So there we have function number one: <em>ne</em> as a topicalizing discourse particle. Jason&#8217;s last guess turns out to be spot on.</p>
<h2>Sentence-initial <em>ne</em></h2>
<div class='img img-full'>
<img src="http://ideophone.org/files/ne-after-period-left.png" alt="ne" title="ne" /></p>
<div>Sentence-initial <em>ne</em>, arranged by frequency.</div>
</div>
<p>There is also a sizable number of sentences <em>starting</em> with <em>ne</em> (13,6% of all occurrences according to Jason&#8217;s statistics). About 60% of these are taken up by two collocations: <em>ne ɔso (ne)</em> and <em>ne kama ne</em>. The latter, first identified by Brett, is essentially a lexicalized construction combining clause-initial and clause-final <em>ne</em>, and it means something like &#8220;And after that, &#8230;&#8221;, <em>kama</em> meaning &#8216;back, behind&#8217;. The former, a combination of <em>ne</em> with <em>ɔso</em> &#8216;reason&#8217; can be translated as &#8220;Therefore&#8221;. Thus, both constructions are rather similar to conjunctive adverbs in English, sometimes with a final <em>ne</em> added as a topicalizer announcing the comment that is to follow.</p>
<p>Overall, my feeling is that sentence-initial <em>ne</em> is very similar in function to sentence-initial <em>and</em> in English (for a good overview see Dorgeloh 2004). That is, it contributes to the overall textual cohesion by coordinating idea structures in discourse. It is also the most unmarked mode of connection in stories. Here is an interlinearized example of sentence-initial <em>ne</em>:</p>
<dl class='interlinear'>
<dt class='langdata'><em class='highlight'>Ne</em> sɔ lo-karɛ mi sɔ nda kaa se?</dt>
<dd class='gloss'><em class='highlight'>ne</em> COMP I-ask you.pl COMP how home is</dd>
<dd>And he said, &#8216;I asked you, how is home?&#8217;</dd>
</dl>
<p>One remaining question of course is: what do sentence-initial and sentence-final <em>ne</em> have to do with each other, if anything? My first pass at an answer would be that both uses have to do with <em>speaker continuation</em>, or the &#8216;speaker-imposed structuring of ideas&#8217;, as Dorgeloh (2004:1763) calls it; but this is an issue that needs to be investigated more closely.</p>
<p>Let me just end with a final note on the nature of the corpus. It consists of a lengthy text on traditional funeral rites in Kawu (Ray, this is the part that includes the poem, actually a funeral dirge); a long narrative about Tete Kalai, son of a smith; a personal narrative by a retired preacher; and some ten shorter stories from fast track literacy participants. All writers are adult native speakers of Siwu. The texts are set in the official orthography, which doesn&#8217;t mark tone. All in all, it seems that even this first corpus, though relatively small, is large and varied enough to bring out salient grammatical structures, just as I had hoped. </p>
<p>Thanks to the fellow language bloggers at Language Log for <a href='http://languagelog.ldc.upenn.edu/nll/?p=251'>bringing the challenge</a> to the attention of its readers; and above all, a big thank you to <a href='http://english-jack.blogspot.com/'>Brett</a>, <a href='http://segalbooks.blogspot.com/'>Ray</a>, and <a href='http://jorendorff.blogspot.com/'>Jason</a> for your invaluable input! The Siwu for &#8216;thank you (pl)&#8217; literally means &#8216;you are the ones who did the work&#8217;. Very true. <em>Mìlobrara!</em></p>
<h2>References</h2>
<ol class='references'>
<li>Dorgeloh, Heidrun. 2004. Conjunction in sentence and discourse: sentence-initial and and discourse structure. <em>Journal of Pragmatics</em> 36, no. 10 (October): 1761-1779.</li>
<li>Sacks, Harvey, Emanuel A. Schegloff, and Gail Jefferson. 1974. A Simplest Systematics for the Organization of Turn-Taking for Conversation. <em>Language</em> 50, no. 4 (December): 696-735.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://ideophone.org/many-eyes-on-siwu-ne/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Visual corpus linguistics with Many Eyes</title>
		<link>http://ideophone.org/visual-corpus-linguistics/</link>
		<comments>http://ideophone.org/visual-corpus-linguistics/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 21:00:10 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Siwu]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://ideophone.org/?p=76</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Visual+corpus+linguistics+with+%3Cem%3EMany+Eyes%3C%2Fem%3E&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-14&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/visual-corpus-linguistics/&amp;rft.language=English"></span>
I recently came across Many Eyes, a nifty data visualisation tool by IBM&#8217;s Visual Communication Lab. It has lots of options to handle tabular data, but &#8212;more interesting to linguists&#8212; it can also handle free text. The two visualization options &#8230; <a href="http://ideophone.org/visual-corpus-linguistics/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Visual+corpus+linguistics+with+%3Cem%3EMany+Eyes%3C%2Fem%3E&amp;rft.aulast=Dingemanse&amp;rft.aufirst=Mark&amp;rft.subject=Linguistics&amp;rft.subject=Siwu&amp;rft.subject=Software&amp;rft.subject=Visualization&amp;rft.source=The+Ideophone&amp;rft.date=2008-06-14&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://ideophone.org/visual-corpus-linguistics/&amp;rft.language=English"></span>
<abbr class="unapi-id" title="http://ideophone.org/?p=76"><!-- &nbsp; --></abbr>
<p>I recently came across <a href='http://services.alphaworks.ibm.com/manyeyes/' ><em>Many Eyes</em></a>, a nifty data visualisation tool by IBM&#8217;s Visual Communication Lab. It has lots of options to handle tabular data, but &mdash;more interesting to linguists&mdash; it can also handle free text. The two visualization options it currently offers for text are a tag cloud and a so-called &#8216;word tree&#8217;. The former visualizes simple token frequency, the latter displays the occurences of a given word (or phrase) in a branching view. It is the latter that I find the most exciting feature, because it allows for rapid visual exploration of linguistic patterns in a text.</p>
<div class="img img-full">
Many Eyes applet temporarily disabled.<br />
<!--<br />
<script type="text/javascript" src="http://services.alphaworks.ibm.com/manyeyes/api/v1/snapshot/89ade5ae1a7cb139011a882a7f03072c.js?height=350"></script> -->
</div>
<p>Take for instance the Siwu locative marker <em>i</em>. Before today I vaguely knew where it usually occurs (before an NP and after a VP, more or less). Now I know (1) that it also occurs sentence initially, as in <em class='langdata'>I Ɔtuka ame, ...</em> {LOC Lolobi inside} 'In Lolobi country, ...'; (2) that it often precedes a deictic, as in <em class='langdata'>...i mmɔ</em> {LOC there} 'over there'; and (3) that one can have nested occurrences, as in <em class='langdata'>ma-sɛ ma-a-su kaku i ngbe-gɔ i ɔturi ɔ-kpi mmɔ</em> {they-HAB they-FUT-take funeral LOC here-REL LOC person he-died there} 'they usually will hold the funeral there were ('in the place in which') the person died'. The next step is to look more carefully into these particular constructions and improve my grammatical analysis. I might conclude, for example, that the distal deictic <em class='langdata'>mmɔ</em> is more nouny than I had taken it to be. </p>
<p>Of course, I would have discovered these facts eventually after carefully analyzing enough Siwu texts &mdash; but the point is that right now, finding and comparing these patterns took only five minutes of playing around with the word tree above. Cool, isn't it? Let's call it <em>visual corpus linguistics</em>. </p>
<p>Some of the best options, by the way, are not available in the embedded view above. Take a look at the <a target='blank' href='http://services.alphaworks.ibm.com/manyeyes/view/St29TOsOtha6gQk~zd0WO2~'>full version</a> to order alphabetically or by frequence, or try the start/end radio buttons which enable you to view preceding and following words. And if you don't happen to read Siwu, why don't you check out Shakespeare's <a href='http://services.alphaworks.ibm.com/manyeyes/view/S95RjIsOtha6X6UkmqrkI2-' target='blank'> All's Well That Ends Well</a>?</p>
<h2>Collaborative data mining</h2>
<p>Wait, let me retract that. It doesn't even matter that you don't know any Siwu &mdash; the power of frequency-based visualization is that you should be able to spot salient grammatical patterns anyway. It would be interesting to test this. <em class='highlight'>Can you, based on <a target='blank' href='http://services.alphaworks.ibm.com/manyeyes/view/St29TOsOtha6TRk8HJ5WO2~'>this dataset</a> of 21,891 Siwu words and punctuation marks, work out what kind of functions the word <em>ne</em> might have?</em></p>
<p>Perhaps I'm making it too difficult by not providing the meaning of the Siwu text, but I do think that this type of collaborative pattern hunting is an interesting addition to the toolkit for linguistic analysis. In the future, we will use visualizations like this to get a feeling for the relative frequencies of different constructions in which an item occurs, and to quickly test each other's hypotheses. The philosophy behind Many Eyes is interesting in this respect. As the <a href='http://services.alphaworks.ibm.com/manyeyes/page/About_Many_Eyes.html'>About</a> page says: <cite>Many Eyes is a bet on the power of human visual intelligence to find patterns. Our goal is to "democratize" visualization and to enable a new social kind of data analysis.</cite> </p>
<p>I am quite sure that there are specialized corpus linguistic applications out there that have far more sophisticated data-munching and searching capabilities. But the simplicity of Many Eyes may be its secret power. It is totally hasslefree: feed it a text and you can instantly play around with different types of visualizations. The intuitive interface makes it possible to quickly traverse the data in search of patterns, or do some quick testing of constructional hypotheses. In fact, you do not even need to upload your own dataset, because you can create visualizations based on any existing dataset (all the data is publicly available). And did I mention it is free?</p>
<h2>Limitations</h2>
<p>Despite all the goodness, Many Eyes is not perfect, although to be fair, it wasn't exactly made for linguistic analysis to begin with. After some playing around, you will inevitably hit upon the limitations; for me, it was a bit disappointing for instance to see that the tag cloud chokes on Unicode characters outside the basic Latin range. The following few relatively simple improvements would hugely enhance the linguistic uses of Many Eyes:</p>
<ul>
<li>tag cloud: proper handling of Unicode characters outside the basic Latin range.</li>
<li>visualization over multiple data sets. (Right now, a workaround is to make a new dataset combining other data sets. However, this is not straigtforward.)</li>
<li>word tree: positional wildcards to enable searching for patterns like Siwu "<em>i * ame</em>" (LOC X inside) or English "<em>I * know</em>" (capturing 'I don't know', 'I damn well know', etc.) . This would involve a branching tree inside two focus words, very interesting visually. It should be possibly to limit the search to N intervening words, or to search for an arbitrary number of intervening words (respecting sentence boundaries, I guess).</li>
<li>word tree: show target word in full context (i.e. show the words left and right of it). Note that this is not trivial, as there is the problem of how to connect left and right parts of sentences on potential regroupings. The sort by occurrence option would no longer work in any case. Perhaps highlighting the relevant connector lines on mouseover would be a good solution.</li>
<li>word tree: some way to do partial word searches, e.g. "<em class='langdata'>*kɛlɛ</em>" to find all occurrences of the verb <em class='langdata'>kɛlɛ</em> 'go' regardless of subject agreement and tense/aspect prefixes. This would involve non-trivial ordering and visualization problems (how to visualize the difference between a preceding word and a preceding part of a word? Perhaps this would add too much clutter).
</ul>
<p>Until these are implemented (either in Many Eyes or in a specialized linguistic cousin), those of us working on radically isolating languages will probably be more happy with the tool than those working on the polysynthetic end of the continuum. Be that as it may, <em>Many Eyes</em> remains an exceptionally useful application, offering a glance into the bright future of visual corpus linguistics. Thank you, <a href='http://www.research.ibm.com/visual/'>VCL people</a>!</p>
<p>(Hat tip: <a href='http://www.ethanzuckerman.com/blog/2008/06/13/simple-examples-of-cool-ideas-last-post-from-mit-conference/' >Ethan Zuckerman</a>.)</p>
<h2>Links</h2>
<ul>
<li><a href='http://services.alphaworks.ibm.com/manyeyes/page/About_Many_Eyes.html'>About Many Eyes</a></li>
<li><a href='http://www.research.ibm.com/visual/'>Visual Communication Lab</a></li>
<li><a href='http://manyeyes.alphaworks.ibm.com/blog/'>Many Eyes Development Blog</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ideophone.org/visual-corpus-linguistics/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

