Find Jobs
Hire Freelancers

534986 Statistician

N/A

処理中
投稿日: 12年以上前

N/A

完了時にお支払い
The project should be done with free statistical software known as "R" and the MySQL database. For this task, the statistician does not need to know PHP. The focus is on optimizing the variable settings using R and MySQL. Project: Integrating Open Source Optimization Software in Back Testing Overview: We need help implementing open source optimization software ([login to view URL]) to bridge the gap between Stochastic Integer Programming and mixed integer programming problems. R needs to be set up to 'guess' until it finds the optimal values for about 15 variables. In the past, a similar problem was solved using a brute force methdology, however there are now too many variable values for brute force to be feasible. Also, commercial software (SAS) was used to do the statistical analysis, however, it's difficult to find freelance statisticians that have access to SAS (or other commercial software systems) and that forces us to rely on open source software. Fortunately, there seem to be enough modules in R to handle the tasks. Environment: The operating system will be Debian 6.0 with php 5.3.3-7, and MySQL 5.1.49-3. If it's needed, the system can be run on a cloud server (for processing power). The algorythm is written in php. The data is stored in MySQL, and R seems have modules to handle test logistics and the statistical analysis. Here are the R software modules we'll probably use: R + MySQL: [login to view URL] R Quantitative Financial Modelling & Trading Framework: quantmod [login to view URL] R + PHP: r-php [login to view URL] (Call php from R to run the algorythm) R MIP Solvers: GLPK, CLP, SYMPHONY, LP_SOLVE R Testing Pairs of Symbols for Cointegration: [login to view URL] R + SOMA / STOPROG Stochastic optimization algorithm is similar to that of genetic algorithms Background: In the past, we used a brute force methodology to run over 1.2 million tests. A statistician used SAS to analyze the test results and find the optimal values for about a dozen variables. The analysis can be found here: [login to view URL] Please read through it to get a description of the methodology used. Also note the methodologies that didn't work. The optimal values provided were set and new (randomized) tests were run to verify that the values chosen were able to perform as expected. The test results showed that the optimal values work. Since that time, we have updated the software by adding features and variable value settings. Now, we want to use R to find the values while running the tests and have a reusable analysis infrastruture. R Code the Problem: After designing the system, you will need to encode the problem in R. The R code will set the variable names and the ranges and/or bounds of each variable. The code will also specify what which variables should be maximumize or minimumized. The goal of the optimization is to find a set of optimal values to get a predictably high "rank". Rank is a "goodness value" that is used to compare one test result to another. Since the underlying input data (stock prices) is random, rank will not be deterministic. Sometimes the optimal values will return a low value (which may even be a negative number) and sometimes they will return a high value. Here are the various variables and our goal for each: Rank: Rank is the "goodness value". It is calculated by subtracting the setprofit from the godprofit results. R will need to determine what variable settings result in the best average rank (as well as the probability). setprofit: setprofit is calculated by buying all of the symbols and selling them after the test duration. If possible, R should try to maximize the setprofit (by selecting the best symbols to work with). godprofit: godprofit is calculated by running an algorithm over the test duration and calculating the profit. If possible, select symbols to maxmimize godprofit so that in turn, the average rank will be high. Parent Test Loop: The software algorithm is set up to process tests in batches. This way there can be an "apples to apples" comparison on the test results. Therefore, when R selects a test to run to get a specific test result, it will actually get at least a dozen test results. The batch processing is also much more efficient as it requires far less writing and transferring data. startdate: A date chosen at random from the range of dates in the database. startquote: A integer with a decimal (ex: 1.01) It is the lowest price a symbol can have on the random date. set: R will select a set of symbols to be tested using based on their negative correlation strength. Child Test Loop: Once the above values are selected, it's we should set up quantmod (or another script) to loop through and process the following variables as a related batch: frequency: An integer that governs how frequently the algorithm processes the dates in the test duration. density: An integer ( 1 - 4 ) chosen to omit specific symbols before running the tests. level: An integer ( 1 - 10 ) positions: An integer ( 1 - 10 ) The algorithm can process all 10 positions at one time, however, this variable is dependent on the quantity and level values being equal or higher. quantity: An integer ( 2 - 50 ) which is the number of symbols selected for the tests. Minimize this value so that there are less symbols that need to be logistically handled on a daily basis. duration: An integer ( 1 - 730 ) which is the number of days (date range) of the tests. Minimize this value so that the godprofit and rank are high in the shortest timeframe possible. hold: A value ( yes / no ) or ( 1 / 0 ) diversify: A value ( yes / no ) or ( 1 / 0 ) Setting diversify to "yes" should be less risky than "no". startcash: An integer ( 100 - 100,000 ) which is the amount of cash used for all of the positions in the test. Minimize this value (since it should be less risky to divide the capital over multiple positions. cost: An integer ( 0 - 25 ) which is the amount of money paid to the broker for each transaction. The cost is actually set by the broker. Maximize this value to find out how much cost reduces rank. sharebuffer: An integer ( 1 - 5000 ) Maximize this value to reduce slippage and the logistics of trading. profitlimitpercent: An integer ( 0 - 10000 ) Minimize this value to tell the system to "quit while it's rank is high". stoplosspercentage: An integer ( 1 - 100 ) The value represents the amount of cash left over after a loss in a position. Maximize this value to reduce the 'drawdown' for each individual position. buystop_percentage: An integer ( 1 - 10000 ) Minimize this value to tell the algo to "quit if there's an unusually high performance for a symbol price while the algo is holding a position in it. sellstoppercentage: An integer ( 0 - 100 ) Maximize this value to reduce the drawdown while the aglo holds a position. trailingpercentage: An integer ( 0 - 100 ) Maximize this value to reduce the drawdown after a position price increase. pricebufferpercent: An integer ( 0 - 100 ) Maximize this value to discover how high the pricebufferpercent is before it reduces rank. That setting will be used to control the amount of slippage the algo will allow. // Extra values: drift: An integer ( -100 - 10000 ) with a decimal. This value can be positive or negative. It is the change in price after the algo's last transaction. Reducing the duration should reduce drift and return less random results. number of transactions: An integer ( 0 - 730 ) Reduce this value so that there are less logistic problems. correlationdays: An integer ( 0 - 90 ). This is the number of days R + testForCoint will test for a correlation between the symbols in the set. Minimize the number of correlationdays to increase the number of symbols that can be selected. Correlations between symbols probably breaks down over time. correlationpercent: An integer ( -100 - 100 ). Minimize this variable to find the strongest negative correlation for the symbols in the set. This percentage value should be the 'minimum strength' of the correlation. Testing Process: Decide what test to run: There will be a small R script that defines the scope of variable values. R will look at the scope of the test and decide what set of variable values to test. R will need to analyze the database to figure out what settings to test next. R will call an MIP solver to optimize a ranking value. Record the test variable values: R will record the variable values in a MySQL database. The values will be stored in a settings table called 'dayoptions' and in a test table (so that each rank result can be associated with the values). The settings in the dayoptions table will be used by the algo during the test. Select symbols for the Set: R will select the symbols to be traded. Have R take the random startdate and select symbols. Next, it will back 'correlationdays' in time & select the symbols that have a strong negative correlation. The symbols do not need to be grouped in pairs, and ideally, each symbol will be negatively related to several other symbols in the set. This symbol selection infrastructure will be routinely used to find negatively correlated symbols (so it should be set up to be somewhat easy to use). The symbols and prices will be in MySQL tables. R must select a random date that is at least 'duration' days from the maximum date in the historical quotes database. The random startdate must also be at least 'correlationdays' from the minimum date in the database. If preferred, some of the functionality of selecting the symbols can be coded in a php script (rather than R). After selecting the symbols the set of symbols should be recorded in the 'symbols' and 'test' tables. Run the tests: R can use quantmod (or we can code a php script) to loop time and run the batch of tests. The code will need to call the algo for each time incriment. If there is some sort of malfunction the algo will record a notation in the database. The statistician/R programmer can decide which script will handle the testing loop. The testing framework will record the setprofit, godprofit & rank in the test table. Once the algo is done running, it can send a response to R. The test process will be looped until the average 'rank' value has been optimized. Check the optimization: The historical data will be split into parts. One part will be used to optimize the values. Next, the tests will be run again on another part of the data to make sure the optimal values are valid.
プロジェクト ID: 2280923

プロジェクトについて

リモートプロジェクト
アクティブ 12年前

お金を稼ぎたいですか?

Freelancerで入札する利点

予算と期間を設定してください
仕事で報酬を得る
提案をご説明ください
登録して仕事に入札するのは無料です

クライアントについて

UNITED STATESのフラグ
Atlanta, United States
4.9
61
メンバー登録日:4月 30, 2003

クライアント確認

ありがとうございます!無料クレジットを受け取るリンクをメールしました。
メールを送信中に問題が発生しました。もう一度お試しください。
登録ユーザー 投稿された仕事の合計
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
プレビューを読み込み中
位置情報へのアクセスが許可されました。
あなたのログインセッションの有効期限がきれ、ログアウトされました。もう一度ログインしてください。