Skip to content

ValPagliai/code_switching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

code_switching

Analysis of Spanish-English code-switching from the Miami Corpus.

Authors: Valeria Pagliai & Josh Higdon

For our libraries and packages, we used NumPy, Pandas, Counter and NLTK to process the data. The first step was to remove all the punctuation listed as an item and the filler words from the elements included in the data. Secondly, since the tags provided included a translation of the word and extra detail that we considered not relevant for the analysis, we removed everything but the second element of this column. For the cases where there was a contraction, through if statements we considered various options: tag “N” for any proper noun, “V” for every verb (without its conjugation detail), and “element + V” if there was a contraction.

After getting a list with the tags only, we added this information as a new column to the database. For the third step, we grouped our information with its sentence identifier number, which allowed us to divide the elements into 3 different lists (by using NumPy); of sentences, of tags and of language. These lists were then transformed into lists of bigrams with NLTK. On the fourth place, we merged these lists into a list of tuples, to get the information as: [word1, word2, tag1, tag2, lang1, lang2]. With this, we found instances of codeswitching by iterating over the 5th and 6th element of every tuple and adding them to a new list. Finally, we got the 2nd and 3rd component to get our lists of tags. The same process was performed for each of the 10 files studied. At the end, Counter was used to get the number of appearances for each pair of tags.

About

Analysis of Spanish-English code-switching from the Miami Corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors