code_switching

Analysis of Spanish-English code-switching from the Miami Corpus.

Authors: Valeria Pagliai & Josh Higdon

For our libraries and packages, we used NumPy, Pandas, Counter and NLTK to process the data. The first step was to remove all the punctuation listed as an item and the filler words from the elements included in the data. Secondly, since the tags provided included a translation of the word and extra detail that we considered not relevant for the analysis, we removed everything but the second element of this column. For the cases where there was a contraction, through if statements we considered various options: tag “N” for any proper noun, “V” for every verb (without its conjugation detail), and “element + V” if there was a contraction.

After getting a list with the tags only, we added this information as a new column to the database. For the third step, we grouped our information with its sentence identifier number, which allowed us to divide the elements into 3 different lists (by using NumPy); of sentences, of tags and of language. These lists were then transformed into lists of bigrams with NLTK. On the fourth place, we merged these lists into a list of tuples, to get the information as: [word1, word2, tag1, tag2, lang1, lang2]. With this, we found instances of codeswitching by iterating over the 5th and 6th element of every tuple and adding them to a new list. Finally, we got the 2nd and 3rd component to get our lists of tags. The same process was performed for each of the 10 files studied. At the end, Counter was used to get the number of appearances for each pair of tags.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
code_switching		code_switching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code_switching

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

code_switching

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages