Transform a dictionary into a termbase

$30-250 USD

完了済み

投稿日:

約5年前

$30-250 USD

完了時にお支払い

I'm looking for an excel macro expert to transform a dictionary (Polish English) into a termbase (input file to be imported to Memsource CAT software). The input file is attached for your perusal - in consists of 61031 entries, sample input entry, steps I took to create final termbase for that entry and the final termbase for that entry. The output file needs to comply with the following criteria: 1. Terms organized into columns where each column represents a language 2. Make sure there is the appropriate language code in the header of each column (in our case pl, en) INPUT FILE Example of dictionary entry: analiza f 1. analysis 2. chem. analysis 3. mat. analysis; calculus ~ absorpcyjna absorption analysis first word "analiza" is the Polish term letter "f" "m" "a" means feminine, masculine, neuter 1. 2. 3. numbers denote different versions of translation depending on context "chem." is an abbreviation from chemical and means the area of subject matter "analysis" is the English term "~" means that the following term is a child term and is created by joining parent term "analiza" and child term "absorpcyjna" to create the term "analiza absorpcyjna" "absorption analysis" is the English child term OUTPUT FILE STRUCTURE The basic structure of the output file needs to be this: column A: Polish term column B: English term column C-X: second and next (if available) English term QUALITY ASSURANCE (QA) 1. Check for blank rows (present in source file) and remove them 2. Convention "zob." means "refer to" and it sends to a Polish terms of the same meaning. Use the macro to lookup that synonymous term and insert it's English equivalent 3. Check that the number of Polish & English terms is equal 4. Subsequent english terms are divided by a comma "," OR and a semicolon ";" 5. Bracketed sentences are explanations - they can be ignored and don't need to be included on the output files 6. Abbreviations like "mat." or "chem." denote subject matter areas and are redundant - should be ignored and excluded from output file. The English term is located directly after those abbreviations. How to recognise those abbreviations? They are usually incomplete words with a "." (dot) directly after last letter in the word. 7. Remove entries shorter than 3 letters (1 and 2 letters long) 8. There are some cells where there's no distinguishable marker between terms eg. "~ ~ wielu zmiennych multivariate analysis of covariance" In that instance, the macro should also check for language to check where Polish term ends and where English one begins 9. Spot-check your work - I will spot-check random 900 entries to ensure proper quality and structure IDEAL CANDIDATE: 1. Self-reliant, self-starter 2. Highly experienced in writing Excel macros and/or data analytics and/or data science 3. Great mathematical problem solver capable of building complex algorithms MILESTONES: For 100% payment I require the below milestones are adhered to and that the working files and shared with me for acceptance accordingly. I. Polish terms column is created and passes 300 entry spot check by me II. English terms columns are created and pass 300 entry sport check by me III. Final term base is created, has correct structure, equal amount of terms for Polish and English terms and passes 300 entry spot check by me AND successfully loads into MemSource CAT software.

Data Processing

Excel

Mathematics

Matlab and Mathematica

Statistics

プロジェクト ID: 19408416

プロジェクトについて

7個の提案

リモートプロジェクト

アクティブ 5年前

お金を稼ぎたいですか？

メールアドレス

Freelancerで入札する利点

予算と期間を設定してください

仕事で報酬を得る

提案をご説明ください

登録して仕事に入札するのは無料です

アワード者:

@apstewart

Here is my proposal: PLATFORM - Convert the input file to UTF-8 encoded CSV format. - Use a custom Java application to process the data. Java has excellent Unicode support. DATA - The provided input file - Find or generate alphabetized term lists for English and Polish. - Create a "stoplist" of words to be ignored, such as the part-of-speech markers. PROCESS (1) Generate all necessary data files (lists of words, abbreviations, etc.) and manually edit these if necessary. (2) Do an initial parse of the input file simply to make sure all the input data is identifiable. (3) Parse and generate the output files, following the directions. (4) Spot check the output and look for improvements; ask questions. OUTCOMES With this type of project, it should be fairly straight-forward to get around 80% accuracy. Then with more work and refinement, we should be able to get a higher percentage. The last few percentage points are typically almost impossible without manual intervention. My offer is to implement the specification you have given, plus any reasonable modifications that give a big improvement in the output. Once we reach a point where the remainder is best done with manual editing, or if we find many cases that we had not considered that do not lend themselves to a straight-forward algorithm (e.g., requiring machine learning), then I consider myself done.

$140 USD 7日以内

0.0

(0 レビュー)

0.0

この仕事に7人のフリーランサーが、平均$176 USDで入札しています

@schoudhary1553

Hello, I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projects like this. For evidence you can see my profile. Please visit : https://www.freelancer.com/u/schoudhary1553 I have excellent command over English. I am a hard worker, productive and worthy of your attention I hope, I would be the right candidate for this post. Awaiting an affirmative response from you. Kinds Regards, Sandeep

$250 USD 4日以内

4.9

(332 レビュー)

7.9

@translatorgurus

Dear Sir/Madam! If you're searching an experienced typist to type your document, with a rapid turnaround, then you've found TranslatorGurus. We have provided our clients with thousands of articles, eBooks, technical documents, website content pages, and blog posts typing over the last 5 years. So feel free to contract with us for high quality and hassle free typing with your best price. Kind Regards, Mime

$30 USD 1日以内

5.0

(2 レビュー)

2.2

@akkijhamta

hello, I will do this project for you, as you mentioned in detail. kindly send me your project detail so we can discuss further detail/ its should be a professional work that i will do in the time. it to be a best meeting/ regard

$35 USD 3日以内